Most Popular
1500 questions
4
votes
1 answer
Extracting features or gene from PCA after calculating PCA for downstream analysis
The quote below is from this paper:
We performed principal component analysis (PCA) of low-coverage sequencing data to identify genes explaining variation across cells. PCA separated the cells into groups corresponding to the source populations…
kcm
- 1,804
- 12
- 27
4
votes
1 answer
HMM Profile from Convergence of retrotransposons in oomycetes and plants
I am trying to recreate the experiment from Convergence of retrotransposons in
oomycetes and plants by Kirill Ustyantsev, Alexandr Blinov and Georgy Smyshlyaev. In the "Methods" section a
HMM profile was constructed from the amino acid alignment…
A.Dumas
- 497
- 3
- 9
4
votes
2 answers
Correlation network representation
I have troubles understanding the figure 10 of this paper:
Similarity network of participating methods for BPO. Similarities are computed as Pearson’s correlation coefficient between methods, with a 0.75 cutoff for illustration purposes. A…
AyşeBanu
- 41
- 2
4
votes
0 answers
R package equivalent to RSeQC infer_experiment to get strandedness of RNA-Seq
I am currently writing an R package that includes a module to run featureCounts (gene quantification tool) from Rsubread. I wanted to be able to specify the correct strandedness option to featureCounts without the user needing to specify whether…
hmgeiger
- 41
- 2
4
votes
1 answer
Finding data to validate results
My group has a complex datset and I would like to validate the results and check the methods I use in a similar dataset (better if it has been already studied).
Characteristics of my dataset:
It is related to the inflammatory bowel disease…
llrs
- 4,693
- 1
- 18
- 42
4
votes
1 answer
What do 0s in stageR mean?
I'm running stageR as part of a pipeline based on DRIMSeq and wanted to know what it means when the transcript level padj is equal to 0, particularly when the gene is non-significant. I'm pretty sure that in some cases if the padj is 0 it means that…
Sethzard
- 95
- 6
4
votes
2 answers
Gene function annotation - bacterial genome
I'm trying to annotate a genome to find all genes with a specific function. I have a FASTA and the read FASTQs - I'd like to assign the functional group of the identified proteins (e.g. Kegg orthology) automatically.
For more context, I have whole…
MichaelKirst
- 41
- 2
4
votes
3 answers
How to specify resources for cluster in snakemake
I am trying to run snakemake version 3.5.4 on my cluster, where jobs are partially defined by a common template build according manual and also defined though rule-specific parameters. But for some reason cluster configuration is not expanding…
Kamil S Jaron
- 5,542
- 2
- 25
- 59
4
votes
2 answers
How to align genomic sequence with corresponding amino acid sequence
Does anyone know of a program that can align a genomic sequence with introns with the corresponding amino acid sequence?
I have both the genomic sequence and the correct amino acid sequence but no information on the genemodel in e.g. gff or genbank…
user1817
- 43
- 5
4
votes
2 answers
Normalizing microarray data for clustering heat map
I wanted to generate a clustering heat map for the microarray data. This is the first time I'm working on Microarray data. I read some tutorials but have few doubts.
I'm using microarray (Affymetrix SNP 6.0 data) gene expression data. For example…
beginner
- 631
- 7
- 15
4
votes
2 answers
sum of products of mapq and mapped bases for each read in a from a BAM file
Given a BAM file, I'd like to calculate the sum, over all reads, of the mapping quality and the number of mapped bases (i.e. number of M's in the CIGAR string).
For example, given two reads like this:
read1: mapq=40, mapped_bases=10
read2: mapq=20,…
roblanf
- 962
- 7
- 15
4
votes
2 answers
Retrieve abstract/summary for NCBI bookshelf via Entrez
I want to retrieve abstracts/summaries for NCBI bookshelf entries, eg: "NBK1440"
Docs say the dbname is "books" and efetch guide says a rettype of "docsum" works for all databases. However when I make the…
David Lawrence
- 153
- 5
4
votes
1 answer
Adding entries to bigwig file
I generate bigwig files using a shell script based on bedtools genomecov (to generate a bedgraph from a bam file) bedmap (to compute means across 10 bp bins) and bedGraphToBigWig to convert the binned bedgraph into bigwig.
Sometimes, the bam file…
bli
- 3,130
- 2
- 15
- 36
4
votes
2 answers
Which identifier do I have to use when I want to add a ##Fasta section to a gff?
I have a gff file like this
scaffold_1 source mRNA 2987526 2992430 . - . ID=protein_id_68892;Name="foobar";locus_tag=my_organism_68892;translation=MHTGDALEGSTGNVSILV...
scaffold_1 source exon 2987526 2987805 . - . name…
Cleb
- 743
- 7
- 18
4
votes
1 answer
Standard Cutoff for Moderated T-statistics
I'm looking at some microarray data. For the first time I've calculated a moderated T statistic from limma.
Is there any standard practice for where to cut off that value? For log2 fold change I usually cut off at +/- 1.5 and adjust accordingly; for…
julianstanley
- 401
- 3
- 9