Most Popular
1500 questions
6
votes
4 answers
Merging regions according to their identifier
I have a huge file (20 GB) which has a range of genomic locations, and for each location there is an identifier(4th column), which is sometimes the same.
file1.txt
chr1 10 20 ABC
chr1 13 20 ABC
chr1 14 21 ABC
chr1 22 27 ABC
chr1 29 37…
bapors
- 171
- 5
6
votes
1 answer
Run gffread in multi-thread mode
Is there any possibility to run gffread in multi-thread mode? The answer seems to be 'no' from the manual (or gffread -h), as no multi-thread option is mentioned.
I'm mostly using this utility to extract transcript sequences (FASTA) from annotation…
aechchiki
- 2,676
- 11
- 34
6
votes
1 answer
How to efficiently compute the exact percentage of non-unique k-mers in a genome for given k?
I'm looking for some software that can "efficiently" (time and memory) compute the exact percentage of non-unique k-mers in a genome for given k. I don't need the k-mers or the abundances itself, I just need the percentage.
Alternatively, the result…
Jens
- 69
- 2
6
votes
1 answer
Salmon output: transcripts quantified with zero reads support
I quantified some samples using Salmon. According to the documentation of output format, the last column of the 'Quantification File' represents the number of reads that supported the given transcript, namely:
NumReads — This is salmon’s estimate…
aechchiki
- 2,676
- 11
- 34
6
votes
1 answer
Does the DNA or RNA of some ethnic groups map better than others' to the human reference genome sequence?
I believe that the human genome reference took DNA samples from different people and that some natural variation is included in extra contigs.
However, the human reference genome comes from a limited number of people.
Can the limited number of…
Biomagician
- 2,459
- 16
- 30
6
votes
1 answer
Aggregate sequencing/mapping/etc. metrics from cellranger across Illumina samples
I have a number of single-cell projects processed with cellranger from 10x Genomics. The pipeline produces a number of handy metrics that are summarized for each Illumina sample in web page. These include things like the estimated number of cells,…
Devon Ryan
- 19,602
- 2
- 29
- 60
6
votes
3 answers
Generating the reconstructed alignment from BAM
I have a (small) BAM file with CIGAR and MD fields.
Question 1: What tools exists in Python and/or R to reconstruct the alignment between the reference and the read in a BAM? Given that this is a very standard analysis, I feel that there should be…
ShanZhengYang
- 1,691
- 1
- 14
- 20
6
votes
1 answer
Classification (supervised learning) of expression data on pathway level
I was wondering if there is any way to apply classification algorithms (e.g random forest) on microarray data but not using the genes as predictors/features but the pathway they belong to.
The thing is that the expressions of the genes that belong…
J. Doe
- 575
- 3
- 11
6
votes
1 answer
How to make chromosome color maps for bed ranges
I have genomic .bed file data of 4 different types; type A,B,C,D. These are some examples-
Type A:
1 101380000 101710000 A
1 110085000 110320000 A
Type B:
1 100930000 101335000 B
1 10430000 10560000 B
Type C:
1 …
rishi
- 353
- 1
- 8
6
votes
2 answers
5' and 3' bias in Rna-seq data
I'm working with rna-seq samples. I see 5' bias and also 3' bias in the per-base sequence content plot. From this link I see that the bias at the start of the sequences appears to be the result of biased selection of fragments from the library. And…
stack_learner
- 1,262
- 14
- 26
6
votes
5 answers
Subset smaller BAM to contain several thousand rows from multiple chromosomes
There are many cases whereby I would like to subset a BAM to create a small file in order to work with (e.g. algorithmic testing, debugging, etc.)
Normally I do the following, which will subset the BAM file.bam and keep the header
samtools view -H…
EB2127
- 1,413
- 2
- 10
- 23
6
votes
1 answer
Human Cell Atlas - Data availability
A news item from 2017-10-18 on the website of the Human Cell Atlas states:
In addition, the consortium today also announced the impending release of gene expression profiles from the first one million immune cells collected under the HCA, toward an…
Gregor Sturm
- 273
- 1
- 6
6
votes
1 answer
SNP located within a promoter region (pig)
I have a couple of SNP identifiers such as MARC0073381 or ALGA0066960. The corresponding platform is Illumina Porcine SNP60 BeadChip (WG-410).
I want to know if these SNP are located within a promoter region of a gene.
Bioconductor offers…
easelpeasel
- 61
- 2
6
votes
1 answer
Blast hits disappearing after changing -evalue
I was teaching an introduction to bioinformatics when the students and I noticed strange Blast behavior that we couldn't explain.
With the default evalue parameter, our best hit shows an evalue of 3e-06, while that hit disappears with a evalue…
H. Gourlé
- 439
- 3
- 8
6
votes
0 answers
Difference between computational biology, bioinformatics and biostatistics
I find that in many contexts, the terms computational biology, bioinformatics and biostatistics are often treated as functionally equivalent, and yet for students selecting PhD programs and the like the difference could be quite significant. Is…
Scott Gigante
- 2,133
- 1
- 13
- 32