Most Popular

1500 questions
5
votes
1 answer

Bash script error at paste command

I wrote script for pasting rsids on CADD output. Here is script. #!/bin/bash cd tmp cut -f 1,2 CADD.tsv > fileA paste fileA <(cut -f 2,125 CADD.tsv) > myNewFile bedtools intersect -a myNewFile -b New.vcf -wb |cut -f 1-4,7 > CADD.rsids.tsv I have…
Sarah
  • 105
  • 5
5
votes
2 answers

What does an FDR value of 1 in RNA-seq mean?

I am looking at the supplemental data from the paper "An allelic series of miR-17 ∼ 92-mutant mice uncovers functional specialization and cooperation among members of a microRNA polycistron" which lists the genes that are differentially expressed…
leah
5
votes
3 answers

bcftools filtering all files in a directory

Probably a silly oversight on my part, but I'm trying to filter all the vcfs in a directory with bcftools using a simple loop. My basic command is working fine: bcftools filter -i 'QUAL > 1000' -o filter/file1out.vcf file1in.vcf but when I try to…
Joanne
  • 305
  • 1
  • 5
5
votes
1 answer

Does the ".full.aln" file produced by snippy-core contain all bases of my input sequences aligned to the reference genome?

I have a number of sequences and a reference genome. I used snippy to align each individual sequence with the reference genome. I then used snippy-core on the *.aln files from snippy which produced a new *.aln file, and a *.full.aln. It is my…
Fabien
  • 51
  • 2
5
votes
2 answers

full visualisation of draft genomes alignment

I have two draft genomes (aprox. 500Mb each), one with ~10'000 scaffolds (genome1.fasta) and one with ~1'000 contigs (genome2.fasta). I would like to visualize their pairwise alignment, which a while ago I found it could easily be done by mummerplot…
aechchiki
  • 2,676
  • 11
  • 34
5
votes
5 answers

Where is an up to date miRNA database and what happened to miRBase?

miRBase 21 was published June 26, 2014 and was still in its growth phase. Why is it not being updated anymore or the project declared officially dead? ENSEMBL also uses miRBase as a starting point…
Ido Tamir
  • 163
  • 4
5
votes
2 answers

How to simulate nanopore reads?

I have looked already here: Tools for simulating Oxford Nanopore reads . This doesn't answer my question, because it lists a few Nanopore read simulators, but I have specific problems with each of them. I am trying to simulate some Nanopore reads. I…
Fini
  • 153
  • 8
5
votes
3 answers

Script to allow gene set enrichment analysis of 10x genomics data in R

I have 10x single cell RNA seq data. Which R package is best suited for analysis of the 10x data matrix. What is the script to prepare the data for downstream GSEA analysis. I have already processed samples for single cell data (for multiple other…
Jay
  • 51
  • 1
  • 3
5
votes
1 answer

vcftools: indel size histogram command returns empty file

I would like to get some summary statistics on a vcf file from one individual, which has over a million variant calls. I've tried to make a histogram of indel sizes with this command, vcftools --vcf sample.vcf --out sample --hist-indel-len but it…
Joanne
  • 305
  • 1
  • 5
5
votes
3 answers

finding unique headers in a fasta file using linux command line

I tried to use the following command uniq -u reference.fasta >> reference_uniq.fasta I'd like a count of the unique headers.
crispr
  • 51
  • 3
5
votes
2 answers

How to plot character state changes from a presence-absence matrix on to a phylogeny

I wish to assign character state changes from a presence-absence matrix to a phylogeny. Thus I want to identify the most parsimonious hypothesis for which node and branch a given "mutation" or "change" has occurred. I have tried assigning each…
Gloom
  • 151
  • 2
5
votes
1 answer

Extracting sequences from FASTA beginning with common 5' end

I am trying to figure out the best way to extract sequences from a FASTA file which begin with a common 5' region of 43 nucleotides. Preferably, I would like to to allow for "fuzziness" in this region to allow for mutations or read overlaps. The…
hunter92
  • 53
  • 2
5
votes
1 answer

Simulating DNA sequence evolution in R

I hope someone can lend their thoughts on the below code to generate DNA sequences under the Kimura-2-Parameter model of DNA substitution. The issue is that each time the code is run and the haplotype distribution is examined, there always is a very…
compbiostats
  • 153
  • 3
5
votes
3 answers

Seurat Merged objects tSNE - How to paint on original IDs?

I am working with single-cell RNA-seq data, using the R package "Seurat" to cluster and visual data-points. I had two single cell datasets from which I generated two Seurat objects. I then combined the two using MergeSeurat. I did differential gene…
5
votes
1 answer

What is and how to detect a dephased read

In the methods of this paper, the authors say: To simplify analysis, we first removed any dephased reads in our library (last 6 bases of read did not match the expected sequence). I have read this post and this one, and I think that a dephased…
gc5
  • 1,783
  • 18
  • 32