Most Popular
1500 questions
6
votes
3 answers
How to concatenate "by chromosome"-VCFs?
I have a several VCFs which are VCF which only contain information by chromosome. That is, there's a chromosome 1 VCF (with only chr1), a chromosome 2 VCF (with only chr2), etc.
I checked to make sure that these VCFs were valid via VCFtools, i.e.…
ShanZhengYang
- 1,691
- 1
- 14
- 20
6
votes
2 answers
Structural variant calling for low-coverage PacBio data
PacBio is selling ~10x PacBio SEQUEL long reads as an upgrade to Illumina data for SV discovery.
In a clinical setting, the main requirements are proper sensitivity and specificity but also the processing of cohorts, at least families. This requires…
Manuel
- 588
- 4
- 5
6
votes
1 answer
Stable download URLs
One big problem that I'm regularly facing is that URLs for downloading Bioinformatics data (e.g., RefSeq releases or NCBI genome releases) disappear.
Does anyone have any good solution for this?
Manuel
- 588
- 4
- 5
6
votes
2 answers
Does the MAPQ=0 fraction of a BAM file depend on the insert sizes?
When doing Illumina 2x150bp sequencing of genomic DNA, and after aligning the reads to GRCh38, does the percentage of the non-N fraction of the human genome as MAPQ=0 depend on the insert sizes of the genomic fragments?
This is, for two identical…
719016
- 2,324
- 13
- 19
6
votes
1 answer
What is the ICGC normalized_read_count?
I downloaded gene expression data (exp_seq) from the ICGC file browser.
For each sample and gene, the file contains a normalized_read_count.
What is that value? I couldn't find any information on the ICGC website. The values are definitly too low…
Gregor Sturm
- 273
- 1
- 6
6
votes
3 answers
Improve scRNA-seq dataset for further analysis
I got a dataset from C.Elegans scRNA-seq paper:
GSM2599701_Gene.count.matrix.celegans.cell.Rdata in GSE98561_RAW.tar
The dataset is 40 000 x 68 000, where rows represent genes and columns - cells. So, I took it and tried to process myself to build…
Nikita Vlasenko
- 2,558
- 3
- 26
- 38
6
votes
1 answer
Drawbacks of upper quartile normalization for scRNA-seq data
I would like to use Upper Quartile normalization for scRNA-seq data defined as:
The upperquartile (UQ) was proposed by (Bullard et al. 2010). Here each column is divided by the 75% quantile of the counts for each library. Often the calculated…
gc5
- 1,783
- 18
- 32
6
votes
1 answer
Order of batch effects removal, data imputation and library size normalization in scRNA-seq data
I am preprocessing scRNA-seq data. What is the best practice in use to run both ComBat for batch effects removal, data imputation (to mitigate dropout) and library size normalization?
I thought that library size should be run first, since it is…
gc5
- 1,783
- 18
- 32
6
votes
3 answers
Infer missing UTR features in GFF3 file
I am working on a GFF file that is missing the 5'UTR and 3'UTR information. For example:
ctg123 . gene 1050 9000 . + . ID=gene00001;Name=EDEN
ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1
ctg123 .…
l0110
- 292
- 1
- 10
6
votes
1 answer
Filtering imputed GWAS SNPs based on a MAF difference of 10%
There are many posts on the web regarding QC steps pre and post-imputation.
Does applying below (new?) 10% MAF difference rule make sense, pitfalls?
Here is the process:
Get MAF for imputed set, using SNPTEST with flag -summary_stats_only
Convert…
zx8754
- 1,042
- 8
- 22
6
votes
2 answers
What are doublets in single cell RNA-seq data?
I am reading The Tabula Muris Consortium et al. (pp).
In some organs, cells with more than 2 million reads were also excluded as a conservative measure to avoid doublets.
How exactly is a “doublet” defined? For example, is doublet a set of cells…
gc5
- 1,783
- 18
- 32
6
votes
1 answer
Scaling by linear regression against the number of reads
I am trying to build the preprocessing pipeline presented in The Tabula Muris Consortium et al. (pp).
It is a pipeline to preprocess single-cell sequencing data. There is one step that is not clear:
Counts were log-normalized (log(1 + counts per…
gc5
- 1,783
- 18
- 32
6
votes
1 answer
Gene set enrichment analysis on differential phosphorylation sites
I have:
A list of differentially phosphorylated sites in a knockout condition. Some genes contain as many as 70 possible phosphorylation sites; others contain only one.
A list of genes belonging to a specific gene set annotation.
How can I test…
CloudyGloudy
- 191
- 1
- 4
6
votes
3 answers
Show presence of known mutation in RNA-seq data
We have RNA-seq fastq data from control (WT) patients and a patient with a point mutation at a known location in one gene.
I'd like to retrieve the reads aligning to that gene and show the presence of the mutation.
I can think of 2…
Peter
- 2,634
- 15
- 33
6
votes
4 answers
Convert rs ID of one hg build to rs IDs of another build
I have a list of dbSNP rsIDs for GRCh37 and I want to convert them to the equivalent IDs in GRCh38. This is using the most recent dbSNP build (150 as of the time of this post). Is there any ID mapping available? If not, are there any tools I can…
Rob John
- 221
- 2
- 6