Most Popular

1500 questions
6
votes
2 answers

What is the distribution of indel sizes in a healthy human genome? of insertion:deletion ratios?

My understanding is that indels are from 1bp to 10Kb, and a healthy genome has ~400K-500K Indels. Surely most of these are small. What is the distribution of insertion sizes in a healthy human genome? What is the distribution of deletion sizes?…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
6
votes
2 answers

How to estimate whether a long-read is meaningful sequence?

The setup Imagine that I work on an organism without a reference genome, and that the closest reference genome I can get is quite diverged. E.g. ~10% diverged in terms of SNVs when measured with short reads, and also has a lot of structural variants…
roblanf
  • 962
  • 7
  • 15
6
votes
2 answers

Searching for gene expression data by cell line

I have two cancer cell lines (OCI-Ly18 & riva) that I want to find gene expression data for, but I'm not aware of many gene expression databases that allow searching by cell-line without searching by gene. I tried Genevestigator on the…
6
votes
3 answers

How to find GEO data sets using Drug Bank ID and bioDBnet

I want to find all experiments in GEO that are associated with a drug (for example tolvaptan). Is there any quick and scalable way to to this? I want to query more than 100 drugs. I tried to use bioDBnet to map Drug Bank ID to look up data sets, but…
hhoomn
  • 325
  • 1
  • 5
6
votes
1 answer

Is there a way to assemble contigs starting from a specific sequence?

My work involves searching for marker genes/fragments in metagenomic databases (like the Sequence Read Archive). Once I find these sequences, I would like to know more about the neighboring genomic region. Is there a way I could assemble only…
Laura
  • 909
  • 5
  • 11
6
votes
1 answer

Help with 1D^2 library shearing

I've done my 1st trial of 1D^2 whole genome sequencing using LSK-SQK308 kit with R9.5 flowcell. Without doing library fragmentation, I encountered the unexpected library shearing during sequencing. But I didn't see significant portion of…
yiyi_Z
  • 61
  • 2
6
votes
3 answers

Why is it necessary to add hydrogen and delete water before protein-ligand docking?

What is the reason for adding hydrogen and removing unnecessary water molecules from the protein structure before protein-ligand docking? FYI, the tools I used for docking is GOLD.
Zheng Keong Ng
  • 199
  • 3
  • 9
6
votes
2 answers

Demultiplex nanopore reads with custom barcodes

We have a problem trying to demultiplex MinION sequences with custom barcodes. Do you have any software recommendations we can try for demultiplexing or how to demultiplex these custom barcodes with Albacore? We have tried using albacore but it only…
6
votes
3 answers

Identify non-coding regions from a genome annotation

I have this GTF file and I use the command below on a Linux machine to extract the coding regions of the genome: awk '{if($3=="transcript" && $20=="\"protein_coding\";"){print $0}}' gencode.gtf How I could do the inverse and keep only non coding…
Zizogolu
  • 2,148
  • 11
  • 44
6
votes
1 answer

How should the SAM MD tag match the CIGAR string?

I am trying to understand how the MD:Z tag is used. The following is from the SAM Optional Fields Specification, which gives an example but is not thorough. The MD field aims to achieve SNP/indel calling without looking at the reference. For…
mattm
  • 754
  • 7
  • 19
6
votes
3 answers

Randomness in BLAST

So if you see the BLAST parameters it says The Expected value E is a parameter that describes the number of hits one can "expect" to get by chance when searching a database of particular size. It decreases exponentially as the score (S)…
user37060
  • 61
  • 1
6
votes
5 answers

How to create Phylogenetic Trees from fasta files in Python or R?

I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought. I found Phylo from Biopython only…
Soerendip
  • 1,295
  • 11
  • 22
6
votes
2 answers

How to correctly call a VCF file using damaged DNA? (IonTorrent & FFPE)

EDIT: I am updating this question to make it more specific to my issue. For context - original question prior to edit: How do I obtain a deamination metric when doing the variant calling using the IonTorrent variant caller, and secondly, how do I…
user36196
  • 291
  • 1
  • 6
6
votes
1 answer

Albacore basecalling running but outputs 0 reads

I am trying to basecall data produced by the MinION using the SQK-LSK109 kit and FLO-MIN106 flowcell via the command-line. My version of albacore is the latest (v2.3.4). I tried running using the following command-line (the name of the sequencing…
d_kennetz
  • 631
  • 5
  • 17
6
votes
2 answers

How do I re-name the headers of my Fasta file?

I appologize for asking this, but I really am really bad with regex... Can someone help me transform the headers of my fasta files from…
Graham
  • 63
  • 5