Highest Voted Questions - Bioinformatics Stack Exchange

5

votes

1 answer

Bash script error at paste command

I wrote script for pasting rsids on CADD output. Here is script. #!/bin/bash cd tmp cut -f 1,2 CADD.tsv > fileA paste fileA <(cut -f 2,125 CADD.tsv) > myNewFile bedtools intersect -a myNewFile -b New.vcf -wb |cut -f 1-4,7 > CADD.rsids.tsv I have…

asked Dec 04 '18 at 10:08

Sarah

105
5

5

votes

2 answers

What does an FDR value of 1 in RNA-seq mean?

I am looking at the supplemental data from the paper "An allelic series of miR-17 ∼ 92-mutant mice uncovers functional specialization and cooperation among members of a microRNA polycistron" which lists the genes that are differentially expressed…

differential-expression

asked Dec 01 '18 at 01:50

leah

5

votes

3 answers

bcftools filtering all files in a directory

Probably a silly oversight on my part, but I'm trying to filter all the vcfs in a directory with bcftools using a simple loop. My basic command is working fine: bcftools filter -i 'QUAL > 1000' -o filter/file1out.vcf file1in.vcf but when I try to…

asked Nov 30 '18 at 23:03

Joanne

305
1
5

5

votes

1 answer

Does the ".full.aln" file produced by snippy-core contain all bases of my input sequences aligned to the reference genome?

I have a number of sequences and a reference genome. I used snippy to align each individual sequence with the reference genome. I then used snippy-core on the *.aln files from snippy which produced a new *.aln file, and a *.full.aln. It is my…

asked Nov 24 '18 at 19:17

Fabien

51
2

5

votes

2 answers

full visualisation of draft genomes alignment

I have two draft genomes (aprox. 500Mb each), one with ~10'000 scaffolds (genome1.fasta) and one with ~1'000 contigs (genome2.fasta). I would like to visualize their pairwise alignment, which a while ago I found it could easily be done by mummerplot…

asked Nov 22 '18 at 15:02

aechchiki

2,676
11
34

5

votes

5 answers

Where is an up to date miRNA database and what happened to miRBase?

miRBase 21 was published June 26, 2014 and was still in its growth phase. Why is it not being updated anymore or the project declared officially dead? ENSEMBL also uses miRBase as a starting point…

asked Jun 07 '17 at 08:24

Ido Tamir

163
4

5

votes

2 answers

How to simulate nanopore reads?

I have looked already here: Tools for simulating Oxford Nanopore reads . This doesn't answer my question, because it lists a few Nanopore read simulators, but I have specific problems with each of them. I am trying to simulate some Nanopore reads. I…

asked Oct 23 '18 at 07:40

Fini

153
8

5

votes

3 answers

Script to allow gene set enrichment analysis of 10x genomics data in R

I have 10x single cell RNA seq data. Which R package is best suited for analysis of the 10x data matrix. What is the script to prepare the data for downstream GSEA analysis. I have already processed samples for single cell data (for multiple other…

asked Oct 17 '18 at 18:46

Jay

51
1
3

5

votes

1 answer

vcftools: indel size histogram command returns empty file

I would like to get some summary statistics on a vcf file from one individual, which has over a million variant calls. I've tried to make a histogram of indel sizes with this command, vcftools --vcf sample.vcf --out sample --hist-indel-len but it…

asked Oct 15 '18 at 23:59

Joanne

305
1
5

5

votes

3 answers

finding unique headers in a fasta file using linux command line

I tried to use the following command uniq -u reference.fasta >> reference_uniq.fasta I'd like a count of the unique headers.

asked Oct 10 '18 at 02:32

crispr

51
3

5

votes

2 answers

How to plot character state changes from a presence-absence matrix on to a phylogeny

I wish to assign character state changes from a presence-absence matrix to a phylogeny. Thus I want to identify the most parsimonious hypothesis for which node and branch a given "mutation" or "change" has occurred. I have tried assigning each…

asked Oct 08 '18 at 11:13

Gloom

151
2

5

votes

1 answer

Extracting sequences from FASTA beginning with common 5' end

I am trying to figure out the best way to extract sequences from a FASTA file which begin with a common 5' region of 43 nucleotides. Preferably, I would like to to allow for "fuzziness" in this region to allow for mutations or read overlaps. The…

asked Oct 03 '18 at 20:13

hunter92

53
2

5

votes

1 answer

Simulating DNA sequence evolution in R

I hope someone can lend their thoughts on the below code to generate DNA sequences under the Kimura-2-Parameter model of DNA substitution. The issue is that each time the code is run and the haplotype distribution is examined, there always is a very…

asked Oct 03 '18 at 19:19

compbiostats

153
3

5

votes

3 answers

Seurat Merged objects tSNE - How to paint on original IDs?

I am working with single-cell RNA-seq data, using the R package "Seurat" to cluster and visual data-points. I had two single cell datasets from which I generated two Seurat objects. I then combined the two using MergeSeurat. I did differential gene…

asked Oct 02 '18 at 15:34

David Tatarakis

51
2

5

votes

1 answer

What is and how to detect a dephased read

In the methods of this paper, the authors say: To simplify analysis, we first removed any dephased reads in our library (last 6 bases of read did not match the expected sequence). I have read this post and this one, and I think that a dephased…

asked Sep 26 '18 at 22:12

gc5

1,783
18
32

Most Popular