Most Popular

1500 questions
6
votes
4 answers

Merging regions according to their identifier

I have a huge file (20 GB) which has a range of genomic locations, and for each location there is an identifier(4th column), which is sometimes the same. file1.txt chr1 10 20 ABC chr1 13 20 ABC chr1 14 21 ABC chr1 22 27 ABC chr1 29 37…
bapors
  • 171
  • 5
6
votes
1 answer

Run gffread in multi-thread mode

Is there any possibility to run gffread in multi-thread mode? The answer seems to be 'no' from the manual (or gffread -h), as no multi-thread option is mentioned. I'm mostly using this utility to extract transcript sequences (FASTA) from annotation…
aechchiki
  • 2,676
  • 11
  • 34
6
votes
1 answer

How to efficiently compute the exact percentage of non-unique k-mers in a genome for given k?

I'm looking for some software that can "efficiently" (time and memory) compute the exact percentage of non-unique k-mers in a genome for given k. I don't need the k-mers or the abundances itself, I just need the percentage. Alternatively, the result…
Jens
  • 69
  • 2
6
votes
1 answer

Salmon output: transcripts quantified with zero reads support

I quantified some samples using Salmon. According to the documentation of output format, the last column of the 'Quantification File' represents the number of reads that supported the given transcript, namely: NumReads — This is salmon’s estimate…
aechchiki
  • 2,676
  • 11
  • 34
6
votes
1 answer

Does the DNA or RNA of some ethnic groups map better than others' to the human reference genome sequence?

I believe that the human genome reference took DNA samples from different people and that some natural variation is included in extra contigs. However, the human reference genome comes from a limited number of people. Can the limited number of…
Biomagician
  • 2,459
  • 16
  • 30
6
votes
1 answer

Aggregate sequencing/mapping/etc. metrics from cellranger across Illumina samples

I have a number of single-cell projects processed with cellranger from 10x Genomics. The pipeline produces a number of handy metrics that are summarized for each Illumina sample in web page. These include things like the estimated number of cells,…
Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
6
votes
3 answers

Generating the reconstructed alignment from BAM

I have a (small) BAM file with CIGAR and MD fields. Question 1: What tools exists in Python and/or R to reconstruct the alignment between the reference and the read in a BAM? Given that this is a very standard analysis, I feel that there should be…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
6
votes
1 answer

Classification (supervised learning) of expression data on pathway level

I was wondering if there is any way to apply classification algorithms (e.g random forest) on microarray data but not using the genes as predictors/features but the pathway they belong to. The thing is that the expressions of the genes that belong…
J. Doe
  • 575
  • 3
  • 11
6
votes
1 answer

How to make chromosome color maps for bed ranges

I have genomic .bed file data of 4 different types; type A,B,C,D. These are some examples- Type A: 1 101380000 101710000 A 1 110085000 110320000 A Type B: 1 100930000 101335000 B 1 10430000 10560000 B Type C: 1 …
rishi
  • 353
  • 1
  • 8
6
votes
2 answers

5' and 3' bias in Rna-seq data

I'm working with rna-seq samples. I see 5' bias and also 3' bias in the per-base sequence content plot. From this link I see that the bias at the start of the sequences appears to be the result of biased selection of fragments from the library. And…
stack_learner
  • 1,262
  • 14
  • 26
6
votes
5 answers

Subset smaller BAM to contain several thousand rows from multiple chromosomes

There are many cases whereby I would like to subset a BAM to create a small file in order to work with (e.g. algorithmic testing, debugging, etc.) Normally I do the following, which will subset the BAM file.bam and keep the header samtools view -H…
EB2127
  • 1,413
  • 2
  • 10
  • 23
6
votes
1 answer

Human Cell Atlas - Data availability

A news item from 2017-10-18 on the website of the Human Cell Atlas states: In addition, the consortium today also announced the impending release of gene expression profiles from the first one million immune cells collected under the HCA, toward an…
Gregor Sturm
  • 273
  • 1
  • 6
6
votes
1 answer

SNP located within a promoter region (pig)

I have a couple of SNP identifiers such as MARC0073381 or ALGA0066960. The corresponding platform is Illumina Porcine SNP60 BeadChip (WG-410). I want to know if these SNP are located within a promoter region of a gene. Bioconductor offers…
6
votes
1 answer

Blast hits disappearing after changing -evalue

I was teaching an introduction to bioinformatics when the students and I noticed strange Blast behavior that we couldn't explain. With the default evalue parameter, our best hit shows an evalue of 3e-06, while that hit disappears with a evalue…
H. Gourlé
  • 439
  • 3
  • 8
6
votes
0 answers

Difference between computational biology, bioinformatics and biostatistics

I find that in many contexts, the terms computational biology, bioinformatics and biostatistics are often treated as functionally equivalent, and yet for students selecting PhD programs and the like the difference could be quite significant. Is…
Scott Gigante
  • 2,133
  • 1
  • 13
  • 32