Most Popular
1500 questions
4
votes
2 answers
Targeted NGS, up to 99% of reads have been marked as duplicates
Currently I'm performing whole analysis (pipeline from *.fastq to *.vcf) of 41 samples (targeted NGS). I rely on GATK best practices, however with some modifications. I decided to use the following tools:
#mapping
bwa mem (with mem - alternate…
Adamm
- 206
- 2
- 11
4
votes
1 answer
Preparing binary matrix data for Scikit classification algorithms
I made this post in regular stack overflow but I was told about this awesome feature by @nbryans.
I am a researcher (my programming knowledge is small) conducting analysis on a set of antibiotic (methicillin) resistant and a set of antibiotic…
Daniel Harris
- 303
- 2
- 7
4
votes
1 answer
How to quantify similarity of genomes and find differences in set of S aureus genomes?
I have around 500 annotated proteomes of different bacterial strains and would like to quantify their similarity (or difference). I found gt genomediff from genometools gives me some scores that I can use to generate nice clusters, but I am not sure…
Soerendip
- 1,295
- 11
- 22
4
votes
1 answer
Error in as.vector(x) : no method for coercing this S4 class to a vector
I tried to run the following code in R studio. Everything worked fine, except at the last step [write.table(mdat, "recount_mdat.csv")] when I tried to export the 'mdat', I got the following error:
Error in as.vector(x) : no method for coercing this…
Priya
- 351
- 1
- 3
- 8
4
votes
2 answers
Installing DESeq2 in Ubuntu
I am trying to install DESeq2 in my Ubuntu with R version 3.5.1. According to the package repository in Bioconductor the version should be 3.5.
> R.version
platform x86_64-pc-linux-gnu
arch x86_64
os …
aerijman
- 645
- 5
- 14
4
votes
1 answer
Gene Ranking - signal to noise ratio used in GSEA-P algorithm?
I'm looking at Broad Institute's orignal GSEA-P algorithm R script which I downloaded here: http://software.broadinstitute.org/gsea/downloads.jsp.
I'm trying to adapt their GSEA.1.0.R script to process datasets that have 1 gene expression profile as…
lrthistlethwaite
- 141
- 3
4
votes
2 answers
Plotting coverage of annotation over collection of region
I'm trying to plot "meta" coverage of annotation: i.e. features (eg. gene class) over certain regions. It is similar to read coverage plots over gene body, except my input is two bed files (both in BED6 format) - (A) one containing the regions for…
Siddharth
- 345
- 2
- 12
4
votes
4 answers
Database for proteome-wide predictions of protein structures
Accuracies of protein structure predictors have improved quite a lot in recent years. Algorithms such as Rosetta have gotten robust enough to predict structures of large number of proteins. However, I could't find any initiative to make a database…
user345394
- 675
- 6
- 20
4
votes
1 answer
Prediction of prokaryotic origins of replication (ORI)
I want to predict origins of replication (ORI) on hundreds of prokaryotic genomes. The most straight-forward solution would be to use most commonly used tool, Ori-Finder.
It uses integrated gene prediction, analysis of base composition asymmetry,…
MrTomRod
- 191
- 1
- 4
4
votes
2 answers
"Sequence Duplication Levels" module still fails after pre-processing Illumina data
I want to ask about why the sequence duplication levels are high after I trimmed by using Trimmomatic? I am using the following Trimmomatic operations: HEADCROP = 19 TRAILING = 20 MINLEN = 66.
How can i solve this problem? Thank You.
yy97
- 43
- 1
- 3
4
votes
1 answer
Can blat use more than one core/CPU to speed up the alignment?
I am using BLAT to align two versions of the genome of C. elegans. I can see in the Activity Monitor of my Mac Book Pro High Sierra that blat is using 100% of a CPU. However, is this programme able to use more than one core / CPU to speed up the…
Biomagician
- 2,459
- 16
- 30
4
votes
1 answer
Question on nanopore sequencing data process pipeline (cDNA-PCR)
I recently started doing the analysis on nanopore sequencing data. As I was searching for some help on pre-processing of the data, I found your nice setup pipeline created here:…
Jungwoo Lee
- 43
- 2
4
votes
2 answers
Why are there missing calls in a VCF file from exome sequencing?
My data is a VCF file generated from an exome sequencing variant call pipeline. I'm not very familiar with the sequencing and variant calling process. I noticed that there are some missing genotypes, which are recorded as "./." at the GT field. From…
Yan
- 143
- 4
4
votes
1 answer
How to export web NCBI tBLASTn results in table format with many queries?
Context
I'm an MSc student working on writing up my thesis (back home now) from my laptop and, therefore, unfortunately don't have access to a workstation/server capable of doing the tBLASTn search that I wanted to do. As a result I have been trying…
user3883
- 41
- 1
4
votes
1 answer
What kind of "gff" format does bioawk parse?
I was wondering if I could use the gff parsing capability of bioawk to facilitate the parsing of gtf files, and I looked at the following help message:
$ bioawk -c help
bed:
1:chrom 2:start 3:end 4:name 5:score 6:strand 7:thickstart 8:thickend…
bli
- 3,130
- 2
- 15
- 36