Most Popular
1500 questions
9
votes
1 answer
Best practices for deciding if two structural variants are actually the same variant?
I know that I can use bcftools isec to compare two VCF files containing single nucleotide variants (SNVs); it will generate four possible output files: one with all of the unique called SNVs in file 1, one with all of the unique called SNVs in file…
mdperry
- 258
- 1
- 5
9
votes
1 answer
confidence ellipses for MDS plot in edgeR?
Is it possible to draw e.g. 95% confidence ellipses around samples from the same group on the results from the plotMDS function under edgeR? If so, how?
Deffiz
- 153
- 1
- 2
9
votes
2 answers
Why Ti/Tv ratio?
I'm interested in the transition/transversion (Ti/Tv) ratio:
In substitution mutations, transitions are defined as the interchange of the purine-based A↔G or pryimidine-based C↔T. Transversions are defined as the interchange between two-ring purine…
SmallChess
- 2,699
- 3
- 19
- 35
9
votes
2 answers
Is there a standard definition for "assembly polishing"?
Is there a standard definition for "assembly polishing" in the field?
Is there a standard definition for what polishing algorithms do?
My understanding of "polishing" is strongly influenced by Pilon:
Whereby users input a reference FASTA along…
EB2127
- 1,413
- 2
- 10
- 23
9
votes
2 answers
Converting a BAM file into VCF
I have NGS illumina RNA-seq reads from M. musculus (mm10). I am trying to find variants along the strand portion of the reads in the refseq (mm10).
I mapped a pair of sequence files and generated a BAM file. Now I need to generate a VCF file from…
Lou_A
- 361
- 1
- 4
- 11
9
votes
2 answers
Why do BAM files created by different tools have different file sizes?
I have a BAM created by Picard. I want to filter alignments by flags with samtools view. However, I noticed that even if I apply no filters, the output BAM is different from my input BAM. Are BAMs produced by different tools also different in size?…
medbe
- 847
- 1
- 7
- 9
9
votes
3 answers
How to identify gene expression signatures from gene expression data?
I have TCGA gene expression data. I'm interested in identifying gene expression signatures using the data.
I would like to know whether there are any tools or R packages for identifying gene signatures.
How are gene signatures different from GSEA?
stack_learner
- 1,262
- 14
- 26
9
votes
1 answer
What are all the reference files produced by bwa index, and are these dependent upon whether the reference is zipped?
I have indexed a gzipped reference with bwa: bwa index reference.fa.gz, which produces a series of other files reference.fa.gz.{amb,ann,bwt,pac,sa}. These are working fine with bwa alignment.
I have discovered that samtools does not take a gzipped…
mattm
- 754
- 7
- 19
9
votes
3 answers
Convert R RNA-seq data object to a Python object
I have done some work in R and would like to try a Python tool.
What is a good way to import the data (and its annotations etc) as a Python object?
I am particularly interested in converting a Seurat object into an AnnData object. (Either directly…
Peter
- 2,634
- 15
- 33
9
votes
10 answers
Remove/delete sequences by ID from multifasta
I have a fasta file like this:
>Id1
ATCCTT
>Id2
ATTTTCCC
>Id3
TTTCCCCAAAA
>Id4
CCCTTTAAA
I want to delete sequences that have the following IDs.
Id2
Id3
The IDs are in a .txt file, and the text file will be used to match and delete those…
andresito
- 385
- 1
- 3
- 9
9
votes
1 answer
How to get the count of each kmer past 255 using khmer
I have a Fastq file and I want to get the exact count of each possible kmer from this file.
On a previous post called How to use khmer to count k-mers? Daniel Standage proposed a custom script based on khmer methods that you can see down below :
>>>…
hilta007
- 173
- 6
9
votes
5 answers
Using shells other than bash
As someone who's beginning to delve into bioinformatics, I'm noticing that like biology there are industry standards here, similar to Illumina in genomics and bowtie for alignment, many people use bash as shell.
Is using a shell besides bash going…
EMiller
- 483
- 1
- 4
- 11
9
votes
2 answers
Is there a standard way to clean a PDB file and re-number its residues?
Is there a pre-existing tool which will tidy up the numbering of a PDB file?
Firstly, I would like to re-number the residues on inserts to make the icode an actual residue in the chain (by that I mean it's own number, shifting everything after it,…
TW93
- 449
- 3
- 11
9
votes
2 answers
Tools to do VCF to MAF and MAF to VCF conversion?
Normally, I would use the vcf2maf scripts to convert a VCF to a MAF (or vice versa).
This is great software, but on my system, perl scripts with dependencies are easy to break. (Here it uses VEP.)
Are there any other alternatives to this?
ShanZhengYang
- 1,691
- 1
- 14
- 20
9
votes
2 answers
How to transfer gff annotations in genome with extensive duplications?
Microbial genomes can contain extensive duplications. Often we'd like to transfer annotations from an annotated species to one that is newly sequenced.
Existing tools (e.g. RATT, LiftOver, Kraken) either make specific assumptions about how closely…
scalefreegan
- 91
- 2