Most Popular

1500 questions
4
votes
2 answers

Seurat for clustering bulk RNA-seq?

Is it ever ok to use Seurat for clustering bulk samples? I am looking at FPKM data from ~750 bulk RNA-seq samples generated using Cufflinks. As suggested for FPKM data, I manually input log transformed data to the @data slot [cd138_bm@data <-…
R-Peys
  • 51
  • 1
  • 3
4
votes
1 answer

Calculating bit score: How do you find lambda and K?

To calculate bitscore from score you can use this equation: $S' = (lambda*S - ln(K)) / ln(2)$ If I am trying to manually calculate the bitscore of an HSP of a pairwise blastn alignment, and I know the alignment score, how do I calculate the…
luederm
  • 43
  • 4
4
votes
2 answers

Block wise protein imputation

I am currently working on a dataset that contains 50 samples (10 samples * 5 blocks). The features of the date set are: The data is perfectly balanced between blocks, with equal treatment representation in each block. Each block contains 2 control…
4
votes
1 answer

Reference genome for allele specific expression

We are trying to sort out a pipeline for doing allele specific expression. Our plan is to call SNPs from RNA-seq data and combine with known SNP annotations. A well known problem in ASE is reference bias, where reads are more like to map if they…
Ian Sudbery
  • 3,311
  • 1
  • 11
  • 21
4
votes
1 answer

Seurat with normalized count matrix?

I know that in Seurat we have the function CreateSeuratObject from which the analysis starts, but it accepts raw count matrix according to the documentation. I have only the already normalized count matrix, so is there a way to work with Seurat…
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
4
votes
1 answer

How to map short sequences to long reads, recovering all multiply-mapped high-quality matches

The dilemma: I have around one million short sequences (21 bp to several 100s of basepairs) for which I need to identify all occurrences of in 20-30x coverage noisy long reads (both pacbio and ONT). All of the short sequences and long reads are…
conchoecia
  • 3,141
  • 2
  • 16
  • 40
4
votes
1 answer

bioawk removed part of FASTQ header

I used bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}' in.fq.qz | gzip > out.fq.qz in order to keep a particular read length, but this command shortened the header from @A00199:161:HF3JLDMXX:1:1101:5882:1063…
user977828
  • 453
  • 3
  • 9
4
votes
1 answer

Get results of keyword search on Pfam via python script

I'm interested in all proteins that are in any way associated with Danio rerio. I decided to look them up at Pfam data base and when I just make a keyword search, I get a a nice list which looks like this…
4
votes
1 answer

Efficiently aligning a lot of reads on the same small reference sequence

The context: I have a DNA-sequence coding for a protein, about 1500 bp in length. Using NGS, a lot of reads of (mutants of) this same sequence were acquired. All of these reads need to be aligned to the reference. We're talking about a lot of reads…
4
votes
2 answers

Calculating read average length in a Fastq file with bioawk/awk

I found here this awk script: BEGIN { headertype=""; } { if($0 ~ "^@") { countread++; headertype="@"; } else if($0 ~ "^+") { headertype="+"; } else if(headertype="@") { # This is a nuc sequence len=length($0); if…
user977828
  • 453
  • 3
  • 9
4
votes
1 answer

Combine VCF files

I have a problem with using rbind to combine VCF files using the library VariantAnnotation from Bioconductor. I am reading two VCF files, when I try to combine them in a certain order with rbind I'm getting an error. When I combine them in a…
Kozolovska
  • 241
  • 1
  • 4
4
votes
1 answer

RNA-Seq: clustering/treatment of genes with low expression

I have some RNA-Seq data from leukaemia patients. I want to do unsupervised clustering on them with some other published leukaemia RNA-Seq data and see how they cluster. There are a few problems I encountered while doing this. I read mix messages…
Kent
  • 105
  • 6
4
votes
2 answers

Omics data: How to interpret heatmap and dendrogram output?

How to interpret heat map and dendrogram output for biological data (omics) in words (when writing results and discussion)? What should I consider (statistics behind?) and what is the best approach? Here is one of my HM for proteomics data. Script…
Kynda
  • 95
  • 1
  • 1
  • 6
4
votes
1 answer

Plot to show the expression of genes between tumor and normal

I have RNA-seq raw counts data for 50 samples. 20 Normal and 30 tumor. After differential analysis I got 30 DEGs. I want to make a violin plot showing the expression of each gene. I transformed counts to logCPM. counts: Genes Tumor1 Tumor2 …
beginner
  • 631
  • 7
  • 15
4
votes
1 answer

size of the pathways for analysis and filtration

I have recently started working on a substance's effect on a cell line in different dosages. for this, there is a tool called bmdexpress2 that I am using. Its input is the normalized counts from RNASeq for each dosage as a big matrix. When it comes…