Highest Voted Questions - Bioinformatics Stack Exchange

6

votes

2 answers

What is the distribution of indel sizes in a healthy human genome? of insertion:deletion ratios?

My understanding is that indels are from 1bp to 10Kb, and a healthy genome has ~400K-500K Indels. Surely most of these are small. What is the distribution of insertion sizes in a healthy human genome? What is the distribution of deletion sizes?…

asked Jun 19 '17 at 20:49

ShanZhengYang

1,691
1
14
20

6

votes

2 answers

How to estimate whether a long-read is meaningful sequence?

The setup Imagine that I work on an organism without a reference genome, and that the closest reference genome I can get is quite diverged. E.g. ~10% diverged in terms of SNVs when measured with short reads, and also has a lot of structural variants…

asked Jun 19 '17 at 00:24

roblanf

962
7
15

6

votes

2 answers

Searching for gene expression data by cell line

I have two cancer cell lines (OCI-Ly18 & riva) that I want to find gene expression data for, but I'm not aware of many gene expression databases that allow searching by cell-line without searching by gene. I tried Genevestigator on the…

asked Jun 16 '17 at 11:42

JSneathThompson

157
4

6

votes

3 answers

How to find GEO data sets using Drug Bank ID and bioDBnet

I want to find all experiments in GEO that are associated with a drug (for example tolvaptan). Is there any quick and scalable way to to this? I want to query more than 100 drugs. I tried to use bioDBnet to map Drug Bank ID to look up data sets, but…

asked Jun 15 '17 at 13:09

hhoomn

325
1
5

6

votes

1 answer

Is there a way to assemble contigs starting from a specific sequence?

My work involves searching for marker genes/fragments in metagenomic databases (like the Sequence Read Archive). Once I find these sequences, I would like to know more about the neighboring genomic region. Is there a way I could assemble only…

asked Apr 09 '19 at 11:50

Laura

909
5
11

6

votes

1 answer

Help with 1D^2 library shearing

I've done my 1st trial of 1D^2 whole genome sequencing using LSK-SQK308 kit with R9.5 flowcell. Without doing library fragmentation, I encountered the unexpected library shearing during sequencing. But I didn't see significant portion of…

asked Mar 27 '19 at 07:05

yiyi_Z

61
2

6

votes

3 answers

Why is it necessary to add hydrogen and delete water before protein-ligand docking?

What is the reason for adding hydrogen and removing unnecessary water molecules from the protein structure before protein-ligand docking? FYI, the tools I used for docking is GOLD.

asked Mar 27 '19 at 00:45

Zheng Keong Ng

199
3
9

6

votes

2 answers

Demultiplex nanopore reads with custom barcodes

We have a problem trying to demultiplex MinION sequences with custom barcodes. Do you have any software recommendations we can try for demultiplexing or how to demultiplex these custom barcodes with Albacore? We have tried using albacore but it only…

asked Feb 28 '19 at 16:18

Martín Terán

63
1
4

6

votes

3 answers

Identify non-coding regions from a genome annotation

I have this GTF file and I use the command below on a Linux machine to extract the coding regions of the genome: awk '{if($3=="transcript" && $20=="\"protein_coding\";"){print $0}}' gencode.gtf How I could do the inverse and keep only non coding…

asked Feb 22 '19 at 12:20

Zizogolu

2,148
11
44

6

votes

1 answer

How should the SAM MD tag match the CIGAR string?

I am trying to understand how the MD:Z tag is used. The following is from the SAM Optional Fields Specification, which gives an example but is not thorough. The MD field aims to achieve SNP/indel calling without looking at the reference. For…

asked Jun 13 '17 at 16:24

mattm

754
7
19

6

votes

3 answers

Randomness in BLAST

So if you see the BLAST parameters it says The Expected value E is a parameter that describes the number of hits one can "expect" to get by chance when searching a database of particular size. It decreases exponentially as the score (S)…

blast

asked Feb 14 '19 at 21:47

user37060

61
1

6

votes

5 answers

How to create Phylogenetic Trees from fasta files in Python or R?

I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought. I found Phylo from Biopython only…

asked Feb 13 '19 at 17:07

Soerendip

1,295
11
22

6

votes

2 answers

How to correctly call a VCF file using damaged DNA? (IonTorrent & FFPE)

EDIT: I am updating this question to make it more specific to my issue. For context - original question prior to edit: How do I obtain a deamination metric when doing the variant calling using the IonTorrent variant caller, and secondly, how do I…

variant-calling

asked Jan 30 '19 at 18:34

user36196

291
1
6

6

votes

1 answer

Albacore basecalling running but outputs 0 reads

I am trying to basecall data produced by the MinION using the SQK-LSK109 kit and FLO-MIN106 flowcell via the command-line. My version of albacore is the latest (v2.3.4). I tried running using the following command-line (the name of the sequencing…

asked Jan 22 '19 at 20:26

d_kennetz

631
5
17

6

votes

2 answers

How do I re-name the headers of my Fasta file?

I appologize for asking this, but I really am really bad with regex... Can someone help me transform the headers of my fasta files from…

asked Jan 19 '19 at 01:29

Graham

63
5

Most Popular