Highest Voted Questions - Bioinformatics Stack Exchange

5

votes

1 answer

Does picard markduplicate toggle PCR duplicate samflag

I have a RNA-seq bam file and there are few reads that are puzzling me. According to the bam header, this bam file is sorted by coordinate, created using tophat and markduplicate step is not done. But some reads are marked for being duplicate in the…

asked Oct 27 '17 at 14:49

svural

151
2

5

votes

4 answers

How to map PDB chains to Uniprot IDs using API services

I have a lot of PDB IDs and I need to get uniprot fasta sequences of these PDB IDs special chains by API services. For example, imagine that I need to get fasta sequence of '1kf6' 'A' chain. The uniprot entry (Accession) for this '1kf6' chain is…

asked Oct 24 '17 at 18:48

Sara

777
1
6
18

5

votes

2 answers

Creating a new reference genome B by changing genome A with mismatches from BAM file without long reads

This question is slightly related to this one: Improve a reference genome with sequencing data Only in my case, I have a starting genome reference A, of a short phage genome, and reads from a related genome B, for which I haven't been able to find a…

asked Oct 24 '17 at 08:14

719016

2,324
13
19

5

votes

0 answers

Help with identifying disease modules

I've made an application that at this point ranks all combinations of drug pairs relevant to a biological network/graph in the order of how disruptive the outcome of deleting the targets of a given drug pair is. So, in short, my applications…

asked Oct 17 '17 at 10:48

user1171426

109
2

5

votes

3 answers

How can I only get the species name for fasta sequences from blast results?

I am trying to make a phylogenetic tree from sequences obtained with blast. I have files that contains hundreds of fasta protein sequences which are all named like >NP_567483.1 transcriptional regulator [Pseudomonas aeruginosa PAO1]. But if I use…

asked Oct 15 '17 at 17:10

JulianT

51
2

5

votes

3 answers

Read sorted and indexed BAM files faster in C++?

I have some sorted and indexed alignment large BAM files. I'd like to read all the reads, and apply custom operation on each of them. The orders aren't important. I'm using htslib in C++ to do the reading. I have a single powerful machine. The…

asked Oct 12 '17 at 23:24

SmallChess

2,699
3
19
35

5

votes

2 answers

Gene expression signature (AUC) from gene expression data with 48 +48 samples

Does anyone have a good method to find a signature where you combine the expression from multiple genes to predict a specific condition. Assume you have performed sequencing of 48 cancer samples and 48 normal sample. Then find a combination of…

asked Oct 09 '17 at 16:08

user2300940

223
1
2
5

5

votes

0 answers

R Biostrings pairwiseAlignment to BAM

The R package Biostrings has a function to create a pairwiseAlignment from pattern and subject sequences. So far I can save the result into a text file using writePairwiseAlignments. I would like to save the result into a SAM/BAM file, but couldn't…

asked Oct 04 '17 at 07:43

Green

151
2

5

votes

1 answer

Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type?

Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type? For example, a FASTA file has 2 sequences. Can the first one encode amino acids while the second one encodes bases? Thank you

fasta

asked Sep 21 '17 at 02:13

user1510

59
1

5

votes

1 answer

hg38 GTF file with RefSeq annotations

I'm not sure what I'm missing, but I'm struggling to find an official hg38 GTF file with RefSeq annotations. I'd like to provide the GTF to Salmon to get gene-level annotations. Here's Salmon's help info for --geneMap: File containing a mapping…

asked Sep 21 '17 at 01:17

Mark Ebbert

1,354
10
22

5

votes

1 answer

cell type specific genes - heatmap using rank based approach

I was wondering if some one can help me with pseudo r code for my approach here based on this paper to plot a heatmap Figure 3 Figure description The top 40 enriched genes percelltypeare shown in a heat map.Onlyhighlyexpressed genes with FPKM ⬎20…

asked Sep 20 '17 at 14:57

novicebioinforesearcher

771
1
6
15

5

votes

3 answers

Way to get genomic sequences at given coordinates without downloading fasta files of whole chromosomes/genomes first?

So I have a list of start and stop positions along chromosomes in different species, and I'd like to get the corresponding DNA sequence for each set of coordinates. In the past, I've just download the genome as a fasta file and then use pyfaidx to…

asked Sep 20 '17 at 02:22

Eric Brenner

132
6

5

votes

2 answers

Software recommendation: find DNA sequence distribution over entire transcript

I would like to create a density/histogram of the distribution of a particular DNA sequence over the entire transcript using R and/or command line tools. From here, I would like to use the coordinates of the bins to map the intron-exon diagram below…

asked Sep 15 '17 at 15:33

syntonicC

161
5

5

votes

1 answer

How can I calculate loss of heterozygosity (LOH) in NGS sequencing data?

I'm analyzing a tumor sample and a healthy sample from the same patient. I want to use sciClone to look at tumor clonality. One input is the genomic regions that need to be excluded due to LOH. I have both WES and WGS. Is there currently a "gold…

cancer

asked Sep 14 '17 at 13:07

story

1,573
1
8
15

5

votes

1 answer

Dealing with absence of coverage

In my snakemake workflow, I use deeptools bamCoverage (in a make_bigwig rule), then computeMatrix and plotProfile (in different plot_*_profile rules) to create coverage profiles for various kinds of small RNAs. It sometimes happens that there are…

asked Sep 14 '17 at 09:24

bli

3,130
2
15
36

Most Popular