Highest Voted Questions - Bioinformatics Stack Exchange

7

votes

1 answer

Where to download JASPAR TFBS motif bed file?

I am interested in determining if any transcription factor binding site motifs are enriched in some BED files from a DNA methylation experiment. I am looking for a database that has BED Files containing regions enriched for specific transcription…

asked Sep 05 '17 at 21:44

Reilstein

367
1
14

7

votes

1 answer

Using the t-SNE algorithm on microarray data + an error bonus

I'm trying to use the t-SNE algorithm on some microarrays data. More specifically my data frame has 18600 columns with genes (features) and 72 rows with conditions with replicates ( 10xWt , 10xTg , etc ). The expression values are in log2…

asked Sep 02 '17 at 17:53

J. Doe

575
3
11

7

votes

2 answers

Filtering step for read counts data

I have around 1200 samples as columns and 60,000 genes with Htseq-Counts data. Before normalization with voom function I want to do filtering step. I want to remove genes whose expression is == 0 in at least 10 samples. Can I do this with read…

asked Sep 01 '17 at 12:03

stack_learner

1,262
14
26

7

votes

1 answer

What will be an appropriate mathematical distribution for SNP data?

I found that several papers describe SNPs as a binomial distribution with the probability of "success" equals to minor allele frequency. However, in my experiments, when I generate SNP array following this distribution, the simulation results…

asked Sep 01 '17 at 02:55

Haohan Wang

521
3
8

7

votes

1 answer

STAR-long parameters for aligning RNA ONT reads to genome

Are there any suggested parameters to align ONT reads to the reference genome using STAR-long? For now, I used the parameters suggested here, but I noticed a weird behaviour. I have RNA reads (D. melanogaster) from R7 and R9 flowcells, separately.…

asked Aug 30 '17 at 22:25

aechchiki

2,676
11
34

7

votes

3 answers

RIP-seq analysis?

Given an experiment consisting of an input (baseline RNA) and IP (pulldown to find RNAs bound to certain protein of interest)... Is a DE analysis performed over the RNA-seq data from the samples (lets say with EdgeR or DESEQ2) suitable to reveal the…

asked Aug 24 '17 at 21:26

Kraken

405
2
9

7

votes

2 answers

Is it wise to use RepeatMasker on prokaryotes?

I'm looking for a way to identify low complexity regions and other repeats in the genome of Escherichia coli. I found that RepeatMasker may be used for example when drafting genomes of prokaryotes (E. coli example). But RepeatMasker works on a…

asked Aug 24 '17 at 13:42

Titouan Bougouin-Laessle

75
6

7

votes

1 answer

samtools mpileup empty when filtering out flags

I produced a bam file by aligning reads to a small set of synthetic sequences using bwa-mem. I am heavily filtering reads that are not paired and of a certain orientation. Applying the filtering, I get a few thousands of reads: samtools view -h…

asked May 24 '17 at 15:37

719016

2,324
13
19

7

votes

3 answers

Is there a way to retrieve several SAM fields faster than `samtools view | cut -f`?

I am constructing a bit of software which pipes the outputs of the bam file via samtools view into a script for parsing. My goal is to (somehow) make this process more efficient, and faster than samtools view. I am only using 3-4 fields in the bam.…

asked Aug 22 '17 at 20:51

ShanZhengYang

1,691
1
14
20

7

votes

2 answers

Correct for gene length or read counts in GO enrichment analysis

It is a well reported fact that GO analysis of RNAseq results is affected by a number of biases, including length bias and expression level bias. The bioconductor package goseq allows you to correct for these biases. By default it corrects for…

asked Aug 21 '17 at 12:05

Ian Sudbery

3,311
1
11
21

7

votes

1 answer

Split FASTQ and matching BAM into matching chunks

I am running a slow downstream analysis on a large set of nanopore reads (approx 3 million), and would like to split them into smaller chunks, run the analysis in massively parallel, and then recombine. Originally I just split the FASTQ into chunks,…

asked Aug 17 '17 at 02:33

Scott Gigante

2,133
1
13
32

7

votes

1 answer

Where can I find summary data for how common certain mutation types are?

I'd like to know how common certain mutation types are in public data sets like the 1000 Genomes, ExAC, and ESP6500. Specifically, I'd like to know the distribution of stop-gains, stop-losses, frameshift, and other mutation types. For example, what…

public-databases

asked Aug 12 '17 at 22:46

Mark Ebbert

1,354
10
22

7

votes

1 answer

The effects of incomplete bisulfite conversion upon mapping efficiency

This question has also been posted on Biostars I have sequenced numerous multiplexed pools of BS amplicon-seq libraries derived from human samples on a MiSeq over the past few weeks. I have been utilising trim-galore and Bismark for alignment and am…

asked May 24 '17 at 06:41

David Ross

313
2
5

7

votes

1 answer

How to calculate overall reference coverage with MUMmer?

Is the MUMmer suite capable of calculating reference sequence coverage statistics for all query sequences collectively? It would be possible to achieve by parsing the output of nucmer / show-coords / show-tiling but it seems like there should be a…

asked Aug 03 '17 at 09:29

bedeabc

248
1
6

7

votes

3 answers

Extract nanopore read ID & start times from fastq file

I have a fastq file from minION (albacore) that contains information on the read ID and the start time of the read. I want to extract these two bits of information into a single csv file. I've been trying to figure out a grep/awk/sed solution, but…

asked Jul 28 '17 at 08:30

roblanf

962
7
15

Most Popular