Most Popular

1500 questions
6
votes
2 answers

Deciding which samples go in which batch

I have 370 samples to sequence, we probably will end up using only 96 samples per run (due to the barcode with the primers we'll use). This means running 4 batches. To minimize the batch effect I need to decide which samples are sequenced in which…
llrs
  • 4,693
  • 1
  • 18
  • 42
6
votes
1 answer

What is the current state-of-the-art in assembling hybrid transcriptomes?

We are considering attempting de novo assembly of a species transcriptomes (i.e. without a reference genome) using the combined NGS outputs of Iso-seq and Illumina. One example I saw (Li et al 2017), used the standard PacBio tools to assemble a…
Ian Sudbery
  • 3,311
  • 1
  • 11
  • 21
6
votes
2 answers

Meaning of the FORMAT fields of the VCF file coming from GIAB project

After reading the GIAB paper in https://www.biorxiv.org/content/early/2018/05/25/281006 and its Figure 1, I am still having trouble understanding the data inside the GIAB VCF file for HG001…
Javier
  • 161
  • 3
6
votes
3 answers

Relationship between sequencing lane and ngs dataset

I am fairly new to NGS data analysis and I am struggling to understand the exact relationship between a sequencing lane and an NGS dataset. I should add I don't work in the lab, I only do bioinformatics. I understand the basics of how to make NGS…
mf94
  • 203
  • 1
  • 4
6
votes
1 answer

Generating random BED intervals given constraints

Problem is to generate a random BED interval given the following constraints: minimum start maximum end fixed length maximum number of masked bases (similar to -maxN option in faSplit) set of intervals to avoid overlap with stay within chromosome…
victorlin
  • 161
  • 2
6
votes
3 answers

How to subset a GRanges via an argument passed into a function?

Let's say I have following example GRanges: > library(GenomicRanges) > gr = GRanges( + seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), + ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)), + strand =…
EB2127
  • 1,413
  • 2
  • 10
  • 23
6
votes
1 answer

Detecting structural variants with MinION data

Working on various cancers I have an interest in detecting structural variation (SV) in human, we've successfully used various tools like Pindel, SVDetect, Manta, and LUMPY, to name a few for detecting SVs in illumina short-read sequencing. I…
Matt Bashton
  • 1,069
  • 6
  • 16
6
votes
1 answer

Why aren't clustal alignments stable under deletion of some of the sequences?

I'm new to pairwise and multiple sequence alignment in general, but I thought I understood how clustal works -- k-tuple distance is a cheap metric that's 'probably approximately' monotonic in the real global pairwise-alignment score, so we can just…
user49404
  • 163
  • 4
6
votes
2 answers

Number of reference sequences in a SAM file

From a single record, I can get the reference sequence ID from the field rID, but how can I get the total number of different reference sequences stored in the whole SAM file? It can be simple as just looping through all records, but I need some…
6
votes
1 answer

How to validate that BAMs have been downloaded correctly?

I currently have several hundred BAM files which were downloaded by someone else. These have remained untouched---before working with them, I would like to double-check that these BAMs have been fully downloaded. I don't MD5 Checksums to look at.…
EB2127
  • 1,413
  • 2
  • 10
  • 23
6
votes
1 answer

What is the difference between SAM mapping quality and Blast E-value?

Blast reports E-values, but short-read mappers report mapping qualities. Are they the same thing? Can they be converted to each other? If not, why blast doesn't report mapping quality while short-read mappers do not report E-values?
medbe
  • 847
  • 1
  • 7
  • 9
6
votes
3 answers

If I modify a PDB file with a specific mutation, how to minimize energy?

I have a series of proteins that are all phylogenetically related. Only one of this proteins is currently X-ray crystallized. Is it valid to modify the reference PDB file (I mean by hand) to match the mutated sequence? And if so, is there any way…
6
votes
2 answers

Why chose CNN for a variant caller

Google released their variant caller DeepVariant which won the highest SNP performance award in the Precision FDA Truth challenge (99.999% accuracy). From the linked github repo, we see that DeepVariant is a CNN, we provide images of the reads as an…
6
votes
3 answers

Volcano plot in R

This question has also been asked on biostars How can I reproduce this volcano plot? I'm only able to do the traditional one, I'm kind knew too these field.
Sofia
  • 351
  • 2
  • 7
6
votes
2 answers

validating identified sub-populations of cells in scRNA-seq

In the analyses of single-cell RNA-seq data there are different unsupervised approaches to identify putative subpopulations (e.g. as available with Suerat or SCDE packages). Is there a good way of computationally validating the cluster solutions?…
Deffiz
  • 153
  • 1
  • 2