Most Popular

1500 questions
5
votes
1 answer

Does picard markduplicate toggle PCR duplicate samflag

I have a RNA-seq bam file and there are few reads that are puzzling me. According to the bam header, this bam file is sorted by coordinate, created using tophat and markduplicate step is not done. But some reads are marked for being duplicate in the…
svural
  • 151
  • 2
5
votes
4 answers

How to map PDB chains to Uniprot IDs using API services

I have a lot of PDB IDs and I need to get uniprot fasta sequences of these PDB IDs special chains by API services. For example, imagine that I need to get fasta sequence of '1kf6' 'A' chain. The uniprot entry (Accession) for this '1kf6' chain is…
Sara
  • 777
  • 1
  • 6
  • 18
5
votes
2 answers

Creating a new reference genome B by changing genome A with mismatches from BAM file without long reads

This question is slightly related to this one: Improve a reference genome with sequencing data Only in my case, I have a starting genome reference A, of a short phage genome, and reads from a related genome B, for which I haven't been able to find a…
719016
  • 2,324
  • 13
  • 19
5
votes
0 answers

Help with identifying disease modules

I've made an application that at this point ranks all combinations of drug pairs relevant to a biological network/graph in the order of how disruptive the outcome of deleting the targets of a given drug pair is. So, in short, my applications…
user1171426
  • 109
  • 2
5
votes
3 answers

How can I only get the species name for fasta sequences from blast results?

I am trying to make a phylogenetic tree from sequences obtained with blast. I have files that contains hundreds of fasta protein sequences which are all named like >NP_567483.1 transcriptional regulator [Pseudomonas aeruginosa PAO1]. But if I use…
JulianT
  • 51
  • 2
5
votes
3 answers

Read sorted and indexed BAM files faster in C++?

I have some sorted and indexed alignment large BAM files. I'd like to read all the reads, and apply custom operation on each of them. The orders aren't important. I'm using htslib in C++ to do the reading. I have a single powerful machine. The…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
5
votes
2 answers

Gene expression signature (AUC) from gene expression data with 48 +48 samples

Does anyone have a good method to find a signature where you combine the expression from multiple genes to predict a specific condition. Assume you have performed sequencing of 48 cancer samples and 48 normal sample. Then find a combination of…
user2300940
  • 223
  • 1
  • 2
  • 5
5
votes
0 answers

R Biostrings pairwiseAlignment to BAM

The R package Biostrings has a function to create a pairwiseAlignment from pattern and subject sequences. So far I can save the result into a text file using writePairwiseAlignments. I would like to save the result into a SAM/BAM file, but couldn't…
Green
  • 151
  • 2
5
votes
1 answer

Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type?

Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type? For example, a FASTA file has 2 sequences. Can the first one encode amino acids while the second one encodes bases? Thank you
user1510
  • 59
  • 1
5
votes
1 answer

hg38 GTF file with RefSeq annotations

I'm not sure what I'm missing, but I'm struggling to find an official hg38 GTF file with RefSeq annotations. I'd like to provide the GTF to Salmon to get gene-level annotations. Here's Salmon's help info for --geneMap: File containing a mapping…
Mark Ebbert
  • 1,354
  • 10
  • 22
5
votes
1 answer

cell type specific genes - heatmap using rank based approach

I was wondering if some one can help me with pseudo r code for my approach here based on this paper to plot a heatmap Figure 3 Figure description The top 40 enriched genes percelltypeare shown in a heat map.Onlyhighlyexpressed genes with FPKM ⬎20…
5
votes
3 answers

Way to get genomic sequences at given coordinates without downloading fasta files of whole chromosomes/genomes first?

So I have a list of start and stop positions along chromosomes in different species, and I'd like to get the corresponding DNA sequence for each set of coordinates. In the past, I've just download the genome as a fasta file and then use pyfaidx to…
Eric Brenner
  • 132
  • 6
5
votes
2 answers

Software recommendation: find DNA sequence distribution over entire transcript

I would like to create a density/histogram of the distribution of a particular DNA sequence over the entire transcript using R and/or command line tools. From here, I would like to use the coordinates of the bins to map the intron-exon diagram below…
syntonicC
  • 161
  • 5
5
votes
1 answer

How can I calculate loss of heterozygosity (LOH) in NGS sequencing data?

I'm analyzing a tumor sample and a healthy sample from the same patient. I want to use sciClone to look at tumor clonality. One input is the genomic regions that need to be excluded due to LOH. I have both WES and WGS. Is there currently a "gold…
story
  • 1,573
  • 1
  • 8
  • 15
5
votes
1 answer

Dealing with absence of coverage

In my snakemake workflow, I use deeptools bamCoverage (in a make_bigwig rule), then computeMatrix and plotProfile (in different plot_*_profile rules) to create coverage profiles for various kinds of small RNAs. It sometimes happens that there are…
bli
  • 3,130
  • 2
  • 15
  • 36