Most Popular
1500 questions
6
votes
2 answers
What is the distribution of indel sizes in a healthy human genome? of insertion:deletion ratios?
My understanding is that indels are from 1bp to 10Kb, and a healthy genome has ~400K-500K Indels. Surely most of these are small.
What is the distribution of insertion sizes in a healthy human genome? What is the distribution of deletion sizes?…
ShanZhengYang
- 1,691
- 1
- 14
- 20
6
votes
2 answers
How to estimate whether a long-read is meaningful sequence?
The setup
Imagine that I work on an organism without a reference genome, and that the closest reference genome I can get is quite diverged. E.g. ~10% diverged in terms of SNVs when measured with short reads, and also has a lot of structural variants…
roblanf
- 962
- 7
- 15
6
votes
2 answers
Searching for gene expression data by cell line
I have two cancer cell lines (OCI-Ly18 & riva) that I want to find gene expression data for, but I'm not aware of many gene expression databases that allow searching by cell-line without searching by gene.
I tried Genevestigator on the…
JSneathThompson
- 157
- 4
6
votes
3 answers
How to find GEO data sets using Drug Bank ID and bioDBnet
I want to find all experiments in GEO that are associated with a drug (for example tolvaptan). Is there any quick and scalable way to to this? I want to query more than 100 drugs.
I tried to use bioDBnet to map Drug Bank ID to look up data sets, but…
hhoomn
- 325
- 1
- 5
6
votes
1 answer
Is there a way to assemble contigs starting from a specific sequence?
My work involves searching for marker genes/fragments in metagenomic databases (like the Sequence Read Archive). Once I find these sequences, I would like to know more about the neighboring genomic region.
Is there a way I could assemble only…
Laura
- 909
- 5
- 11
6
votes
1 answer
Help with 1D^2 library shearing
I've done my 1st trial of 1D^2 whole genome sequencing using LSK-SQK308 kit with R9.5 flowcell. Without doing library fragmentation, I encountered the unexpected library shearing during sequencing.
But I didn't see significant portion of…
yiyi_Z
- 61
- 2
6
votes
3 answers
Why is it necessary to add hydrogen and delete water before protein-ligand docking?
What is the reason for adding hydrogen and removing unnecessary water molecules from the protein structure before protein-ligand docking? FYI, the tools I used for docking is GOLD.
Zheng Keong Ng
- 199
- 3
- 9
6
votes
2 answers
Demultiplex nanopore reads with custom barcodes
We have a problem trying to demultiplex MinION sequences with custom barcodes. Do you have any software recommendations we can try for demultiplexing or how to demultiplex these custom barcodes with Albacore? We have tried using albacore but it only…
Martín Terán
- 63
- 1
- 4
6
votes
3 answers
Identify non-coding regions from a genome annotation
I have this GTF file and I use the command below on a Linux machine to extract the coding regions of the genome:
awk '{if($3=="transcript" && $20=="\"protein_coding\";"){print $0}}' gencode.gtf
How I could do the inverse and keep only non coding…
Zizogolu
- 2,148
- 11
- 44
6
votes
1 answer
How should the SAM MD tag match the CIGAR string?
I am trying to understand how the MD:Z tag is used. The following is from the SAM Optional Fields Specification, which gives an example but is not thorough.
The MD field aims to achieve SNP/indel calling without looking at the
reference. For…
mattm
- 754
- 7
- 19
6
votes
3 answers
Randomness in BLAST
So if you see the BLAST parameters it says
The Expected value E is a parameter that describes the number of hits one can "expect" to get by chance when searching a database of particular size. It decreases exponentially as the score (S)…
user37060
- 61
- 1
6
votes
5 answers
How to create Phylogenetic Trees from fasta files in Python or R?
I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought.
I found Phylo from Biopython only…
Soerendip
- 1,295
- 11
- 22
6
votes
2 answers
How to correctly call a VCF file using damaged DNA? (IonTorrent & FFPE)
EDIT: I am updating this question to make it more specific to my issue.
For context - original question prior to edit:
How do I obtain a deamination metric when doing the variant calling using the IonTorrent variant caller, and secondly, how do I…
user36196
- 291
- 1
- 6
6
votes
1 answer
Albacore basecalling running but outputs 0 reads
I am trying to basecall data produced by the MinION using the SQK-LSK109 kit and FLO-MIN106 flowcell via the command-line. My version of albacore is the latest (v2.3.4). I tried running using the following command-line (the name of the sequencing…
d_kennetz
- 631
- 5
- 17
6
votes
2 answers
How do I re-name the headers of my Fasta file?
I appologize for asking this, but I really am really bad with regex...
Can someone help me transform the headers of my fasta files from…
Graham
- 63
- 5