Most Popular
1500 questions
6
votes
2 answers
Deciding which samples go in which batch
I have 370 samples to sequence, we probably will end up using only 96 samples per run (due to the barcode with the primers we'll use). This means running 4 batches. To minimize the batch effect I need to decide which samples are sequenced in which…
llrs
- 4,693
- 1
- 18
- 42
6
votes
1 answer
What is the current state-of-the-art in assembling hybrid transcriptomes?
We are considering attempting de novo assembly of a species transcriptomes (i.e. without a reference genome) using the combined NGS outputs of Iso-seq and Illumina.
One example I saw (Li et al 2017), used the standard PacBio tools to assemble a…
Ian Sudbery
- 3,311
- 1
- 11
- 21
6
votes
2 answers
Meaning of the FORMAT fields of the VCF file coming from GIAB project
After reading the GIAB paper in https://www.biorxiv.org/content/early/2018/05/25/281006 and its Figure 1, I am still having trouble understanding the data inside the GIAB VCF file for HG001…
Javier
- 161
- 3
6
votes
3 answers
Relationship between sequencing lane and ngs dataset
I am fairly new to NGS data analysis and I am struggling to understand the exact relationship between a sequencing lane and an NGS dataset. I should add I don't work in the lab, I only do bioinformatics.
I understand the basics of how to make NGS…
mf94
- 203
- 1
- 4
6
votes
1 answer
Generating random BED intervals given constraints
Problem is to generate a random BED interval given the following constraints:
minimum start
maximum end
fixed length
maximum number of masked bases (similar to -maxN option in faSplit)
set of intervals to avoid overlap with
stay within chromosome…
victorlin
- 161
- 2
6
votes
3 answers
How to subset a GRanges via an argument passed into a function?
Let's say I have following example GRanges:
> library(GenomicRanges)
> gr = GRanges(
+ seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+ ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
+ strand =…
EB2127
- 1,413
- 2
- 10
- 23
6
votes
1 answer
Detecting structural variants with MinION data
Working on various cancers I have an interest in detecting structural variation (SV) in human, we've successfully used various tools like Pindel, SVDetect, Manta, and LUMPY, to name a few for detecting SVs in illumina short-read sequencing. I…
Matt Bashton
- 1,069
- 6
- 16
6
votes
1 answer
Why aren't clustal alignments stable under deletion of some of the sequences?
I'm new to pairwise and multiple sequence alignment in general, but I thought I understood how clustal works -- k-tuple distance is a cheap metric that's 'probably approximately' monotonic in the real global pairwise-alignment score, so we can just…
user49404
- 163
- 4
6
votes
2 answers
Number of reference sequences in a SAM file
From a single record, I can get the reference sequence ID from the field rID, but how can I get the total number of different reference sequences stored in the whole SAM file?
It can be simple as just looping through all records, but I need some…
Omair Iqbal
- 81
- 2
6
votes
1 answer
How to validate that BAMs have been downloaded correctly?
I currently have several hundred BAM files which were downloaded by someone else. These have remained untouched---before working with them, I would like to double-check that these BAMs have been fully downloaded.
I don't MD5 Checksums to look at.…
EB2127
- 1,413
- 2
- 10
- 23
6
votes
1 answer
What is the difference between SAM mapping quality and Blast E-value?
Blast reports E-values, but short-read mappers report mapping qualities. Are they the same thing? Can they be converted to each other? If not, why blast doesn't report mapping quality while short-read mappers do not report E-values?
medbe
- 847
- 1
- 7
- 9
6
votes
3 answers
If I modify a PDB file with a specific mutation, how to minimize energy?
I have a series of proteins that are all phylogenetically related. Only one of this proteins is currently X-ray crystallized.
Is it valid to modify the reference PDB file (I mean by hand) to match the mutated sequence?
And if so, is there any way…
The Sauralph
- 63
- 3
6
votes
2 answers
Why chose CNN for a variant caller
Google released their variant caller DeepVariant which won the highest SNP performance award in the Precision FDA Truth challenge (99.999% accuracy).
From the linked github repo, we see that DeepVariant is a CNN, we provide images of the reads as an…
Claudiu Creanga
- 163
- 4
6
votes
3 answers
Volcano plot in R
This question has also been asked on biostars
How can I reproduce this volcano plot?
I'm only able to do the traditional one, I'm kind knew too these field.
Sofia
- 351
- 2
- 7
6
votes
2 answers
validating identified sub-populations of cells in scRNA-seq
In the analyses of single-cell RNA-seq data there are different unsupervised approaches to identify putative subpopulations (e.g. as available with Suerat or SCDE packages).
Is there a good way of computationally validating the cluster solutions?…
Deffiz
- 153
- 1
- 2