Most Popular
1500 questions
4
votes
2 answers
Seurat for clustering bulk RNA-seq?
Is it ever ok to use Seurat for clustering bulk samples?
I am looking at FPKM data from ~750 bulk RNA-seq samples generated using Cufflinks. As suggested for FPKM data, I manually input log transformed data to the @data slot [cd138_bm@data <-…
R-Peys
- 51
- 1
- 3
4
votes
1 answer
Calculating bit score: How do you find lambda and K?
To calculate bitscore from score you can use this equation: $S' = (lambda*S - ln(K)) / ln(2)$
If I am trying to manually calculate the bitscore of an HSP of a pairwise blastn alignment, and I know the alignment score, how do I calculate the…
luederm
- 43
- 4
4
votes
2 answers
Block wise protein imputation
I am currently working on a dataset that contains 50 samples (10 samples * 5 blocks). The features of the date set are:
The data is perfectly balanced between blocks, with equal treatment representation in each block. Each block contains 2 control…
Whateversclever
- 51
- 2
4
votes
1 answer
Reference genome for allele specific expression
We are trying to sort out a pipeline for doing allele specific expression. Our plan is to call SNPs from RNA-seq data and combine with known SNP annotations. A well known problem in ASE is reference bias, where reads are more like to map if they…
Ian Sudbery
- 3,311
- 1
- 11
- 21
4
votes
1 answer
Seurat with normalized count matrix?
I know that in Seurat we have the function CreateSeuratObject from which the analysis starts, but it accepts raw count matrix according to the documentation. I have only the already normalized count matrix, so is there a way to work with Seurat…
Nikita Vlasenko
- 2,558
- 3
- 26
- 38
4
votes
1 answer
How to map short sequences to long reads, recovering all multiply-mapped high-quality matches
The dilemma:
I have around one million short sequences (21 bp to several 100s of basepairs) for which I need to identify all occurrences of in 20-30x coverage noisy long reads (both pacbio and ONT). All of the short sequences and long reads are…
conchoecia
- 3,141
- 2
- 16
- 40
4
votes
1 answer
bioawk removed part of FASTQ header
I used
bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}' in.fq.qz | gzip > out.fq.qz
in order to keep a particular read length, but this command shortened the header from
@A00199:161:HF3JLDMXX:1:1101:5882:1063…
user977828
- 453
- 3
- 9
4
votes
1 answer
Get results of keyword search on Pfam via python script
I'm interested in all proteins that are in any way associated with Danio rerio. I decided to look them up at Pfam data base and when I just make a keyword search, I get a a nice list which looks like this…
Đorđe Relić
- 143
- 2
4
votes
1 answer
Efficiently aligning a lot of reads on the same small reference sequence
The context: I have a DNA-sequence coding for a protein, about 1500 bp in length. Using NGS, a lot of reads of (mutants of) this same sequence were acquired. All of these reads need to be aligned to the reference. We're talking about a lot of reads…
Cedric Stroobandt
- 43
- 3
4
votes
2 answers
Calculating read average length in a Fastq file with bioawk/awk
I found here this awk script:
BEGIN {
headertype="";
}
{
if($0 ~ "^@") {
countread++;
headertype="@";
}
else if($0 ~ "^+") {
headertype="+";
}
else if(headertype="@") { # This is a nuc sequence
len=length($0);
if…
user977828
- 453
- 3
- 9
4
votes
1 answer
Combine VCF files
I have a problem with using rbind to combine VCF files using the library VariantAnnotation from Bioconductor.
I am reading two VCF files, when I try to combine them in a certain order with rbind I'm getting an error. When I combine them in a…
Kozolovska
- 241
- 1
- 4
4
votes
1 answer
RNA-Seq: clustering/treatment of genes with low expression
I have some RNA-Seq data from leukaemia patients. I want to do unsupervised clustering on them with some other published leukaemia RNA-Seq data and see how they cluster. There are a few problems I encountered while doing this.
I read mix messages…
Kent
- 105
- 6
4
votes
2 answers
Omics data: How to interpret heatmap and dendrogram output?
How to interpret heat map and dendrogram output for biological data (omics) in words (when writing results and discussion)?
What should I consider (statistics behind?) and what is the best approach?
Here is one of my HM for proteomics data.
Script…
Kynda
- 95
- 1
- 1
- 6
4
votes
1 answer
Plot to show the expression of genes between tumor and normal
I have RNA-seq raw counts data for 50 samples. 20 Normal and 30 tumor. After differential analysis I got 30 DEGs. I want to make a violin plot showing the expression of each gene. I transformed counts to logCPM.
counts:
Genes Tumor1 Tumor2 …
beginner
- 631
- 7
- 15
4
votes
1 answer
size of the pathways for analysis and filtration
I have recently started working on a substance's effect on a cell line in different dosages. for this, there is a tool called bmdexpress2 that I am using. Its input is the normalized counts from RNASeq for each dosage as a big matrix. When it comes…
Fırat Uyulur
- 41
- 3