Most Popular
1500 questions
5
votes
1 answer
Does picard markduplicate toggle PCR duplicate samflag
I have a RNA-seq bam file and there are few reads that are puzzling me.
According to the bam header, this bam file is sorted by coordinate, created using tophat and markduplicate step is not done. But some reads are marked for being duplicate in the…
svural
- 151
- 2
5
votes
4 answers
How to map PDB chains to Uniprot IDs using API services
I have a lot of PDB IDs and I need to get uniprot fasta sequences of these PDB IDs special chains by API services. For example, imagine that I need to get fasta sequence of '1kf6' 'A' chain. The uniprot entry (Accession) for this '1kf6' chain is…
Sara
- 777
- 1
- 6
- 18
5
votes
2 answers
Creating a new reference genome B by changing genome A with mismatches from BAM file without long reads
This question is slightly related to this one:
Improve a reference genome with sequencing data
Only in my case, I have a starting genome reference A, of a short phage genome, and reads from a related genome B, for which I haven't been able to find a…
719016
- 2,324
- 13
- 19
5
votes
0 answers
Help with identifying disease modules
I've made an application that at this point ranks all combinations of drug pairs relevant to a biological network/graph in the order of how disruptive the outcome of deleting the targets of a given drug pair is. So, in short, my applications…
user1171426
- 109
- 2
5
votes
3 answers
How can I only get the species name for fasta sequences from blast results?
I am trying to make a phylogenetic tree from sequences obtained with blast. I have files that contains hundreds of fasta protein sequences which are all named like >NP_567483.1 transcriptional regulator [Pseudomonas aeruginosa PAO1]. But if I use…
JulianT
- 51
- 2
5
votes
3 answers
Read sorted and indexed BAM files faster in C++?
I have some sorted and indexed alignment large BAM files. I'd like to read all the reads, and apply custom operation on each of them. The orders aren't important. I'm using htslib in C++ to do the reading. I have a single powerful machine.
The…
SmallChess
- 2,699
- 3
- 19
- 35
5
votes
2 answers
Gene expression signature (AUC) from gene expression data with 48 +48 samples
Does anyone have a good method to find a signature where you combine the expression from multiple genes to predict a specific condition.
Assume you have performed sequencing of 48 cancer samples and 48 normal sample. Then find a combination of…
user2300940
- 223
- 1
- 2
- 5
5
votes
0 answers
R Biostrings pairwiseAlignment to BAM
The R package Biostrings has a function to create a pairwiseAlignment from pattern and subject sequences.
So far I can save the result into a text file using writePairwiseAlignments. I would like to save the result into a SAM/BAM file, but couldn't…
Green
- 151
- 2
5
votes
1 answer
Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type?
Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type? For example, a FASTA file has 2 sequences. Can the first one encode amino acids while the second one encodes bases?
Thank you
user1510
- 59
- 1
5
votes
1 answer
hg38 GTF file with RefSeq annotations
I'm not sure what I'm missing, but I'm struggling to find an official hg38 GTF file with RefSeq annotations. I'd like to provide the GTF to Salmon to get gene-level annotations.
Here's Salmon's help info for --geneMap:
File containing a mapping…
Mark Ebbert
- 1,354
- 10
- 22
5
votes
1 answer
cell type specific genes - heatmap using rank based approach
I was wondering if some one can help me with pseudo r code for my approach here based on this paper to plot a heatmap
Figure 3
Figure description
The top 40 enriched genes percelltypeare shown in a heat map.Onlyhighlyexpressed genes with FPKM ⬎20…
novicebioinforesearcher
- 771
- 1
- 6
- 15
5
votes
3 answers
Way to get genomic sequences at given coordinates without downloading fasta files of whole chromosomes/genomes first?
So I have a list of start and stop positions along chromosomes in different species, and I'd like to get the corresponding DNA sequence for each set of coordinates. In the past, I've just download the genome as a fasta file and then use pyfaidx to…
Eric Brenner
- 132
- 6
5
votes
2 answers
Software recommendation: find DNA sequence distribution over entire transcript
I would like to create a density/histogram of the distribution of a particular DNA sequence over the entire transcript using R and/or command line tools. From here, I would like to use the coordinates of the bins to map the intron-exon diagram below…
syntonicC
- 161
- 5
5
votes
1 answer
How can I calculate loss of heterozygosity (LOH) in NGS sequencing data?
I'm analyzing a tumor sample and a healthy sample from the same patient. I want to use sciClone to look at tumor clonality. One input is the genomic regions that need to be excluded due to LOH. I have both WES and WGS.
Is there currently a "gold…
story
- 1,573
- 1
- 8
- 15
5
votes
1 answer
Dealing with absence of coverage
In my snakemake workflow, I use deeptools bamCoverage (in a make_bigwig rule), then computeMatrix and plotProfile (in different plot_*_profile rules) to create coverage profiles for various kinds of small RNAs. It sometimes happens that there are…
bli
- 3,130
- 2
- 15
- 36