Highest Voted Questions - Bioinformatics Stack Exchange

4

votes

2 answers

Seurat for clustering bulk RNA-seq?

Is it ever ok to use Seurat for clustering bulk samples? I am looking at FPKM data from ~750 bulk RNA-seq samples generated using Cufflinks. As suggested for FPKM data, I manually input log transformed data to the @data slot [cd138_bm@data <-…

asked Oct 04 '18 at 17:06

R-Peys

51
1
3

4

votes

1 answer

Calculating bit score: How do you find lambda and K?

To calculate bitscore from score you can use this equation: $S' = (lambda*S - ln(K)) / ln(2)$ If I am trying to manually calculate the bitscore of an HSP of a pairwise blastn alignment, and I know the alignment score, how do I calculate the…

asked Sep 28 '18 at 16:56

luederm

43
4

4

votes

2 answers

Block wise protein imputation

I am currently working on a dataset that contains 50 samples (10 samples * 5 blocks). The features of the date set are: The data is perfectly balanced between blocks, with equal treatment representation in each block. Each block contains 2 control…

asked Sep 28 '18 at 14:17

Whateversclever

51
2

4

votes

1 answer

Reference genome for allele specific expression

We are trying to sort out a pipeline for doing allele specific expression. Our plan is to call SNPs from RNA-seq data and combine with known SNP annotations. A well known problem in ASE is reference bias, where reads are more like to map if they…

asked Sep 28 '18 at 12:03

Ian Sudbery

3,311
1
11
21

4

votes

1 answer

Seurat with normalized count matrix?

I know that in Seurat we have the function CreateSeuratObject from which the analysis starts, but it accepts raw count matrix according to the documentation. I have only the already normalized count matrix, so is there a way to work with Seurat…

asked Sep 24 '18 at 18:35

Nikita Vlasenko

2,558
3
26
38

4

votes

1 answer

How to map short sequences to long reads, recovering all multiply-mapped high-quality matches

The dilemma: I have around one million short sequences (21 bp to several 100s of basepairs) for which I need to identify all occurrences of in 20-30x coverage noisy long reads (both pacbio and ONT). All of the short sequences and long reads are…

asked Sep 06 '18 at 07:56

conchoecia

3,141
2
16
40

4

votes

1 answer

bioawk removed part of FASTQ header

I used bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}' in.fq.qz | gzip > out.fq.qz in order to keep a particular read length, but this command shortened the header from @A00199:161:HF3JLDMXX:1:1101:5882:1063…

bioawk

asked Sep 05 '18 at 03:09

user977828

453
3
9

4

votes

1 answer

Get results of keyword search on Pfam via python script

I'm interested in all proteins that are in any way associated with Danio rerio. I decided to look them up at Pfam data base and when I just make a keyword search, I get a a nice list which looks like this…

asked Aug 23 '18 at 16:13

Đorđe Relić

143
2

4

votes

1 answer

Efficiently aligning a lot of reads on the same small reference sequence

The context: I have a DNA-sequence coding for a protein, about 1500 bp in length. Using NGS, a lot of reads of (mutants of) this same sequence were acquired. All of these reads need to be aligned to the reference. We're talking about a lot of reads…

asked Aug 19 '18 at 08:51

Cedric Stroobandt

43
3

4

votes

2 answers

Calculating read average length in a Fastq file with bioawk/awk

I found here this awk script: BEGIN { headertype=""; } { if($0 ~ "^@") { countread++; headertype="@"; } else if($0 ~ "^+") { headertype="+"; } else if(headertype="@") { # This is a nuc sequence len=length($0); if…

asked Aug 18 '18 at 23:43

user977828

453
3
9

4

votes

1 answer

Combine VCF files

I have a problem with using rbind to combine VCF files using the library VariantAnnotation from Bioconductor. I am reading two VCF files, when I try to combine them in a certain order with rbind I'm getting an error. When I combine them in a…

asked Aug 16 '18 at 06:22

Kozolovska

241
1
4

4

votes

1 answer

RNA-Seq: clustering/treatment of genes with low expression

I have some RNA-Seq data from leukaemia patients. I want to do unsupervised clustering on them with some other published leukaemia RNA-Seq data and see how they cluster. There are a few problems I encountered while doing this. I read mix messages…

asked Aug 15 '18 at 17:43

Kent

105
6

4

votes

2 answers

Omics data: How to interpret heatmap and dendrogram output?

How to interpret heat map and dendrogram output for biological data (omics) in words (when writing results and discussion)? What should I consider (statistics behind?) and what is the best approach? Here is one of my HM for proteomics data. Script…

asked Jul 28 '18 at 12:05

Kynda

95
1
1
6

4

votes

1 answer

Plot to show the expression of genes between tumor and normal

I have RNA-seq raw counts data for 50 samples. 20 Normal and 30 tumor. After differential analysis I got 30 DEGs. I want to make a violin plot showing the expression of each gene. I transformed counts to logCPM. counts: Genes Tumor1 Tumor2 …

asked Jul 17 '18 at 13:09

beginner

631
7
15

4

votes

1 answer

size of the pathways for analysis and filtration

I have recently started working on a substance's effect on a cell line in different dosages. for this, there is a tool called bmdexpress2 that I am using. Its input is the normalized counts from RNASeq for each dosage as a big matrix. When it comes…

asked Jul 12 '18 at 08:24

Fırat Uyulur

41
3

Most Popular