Most Popular

1500 questions
3
votes
2 answers

How to convert featureCounts to FPKM?

I have seen many posts regarding counts to RPKM and TPM. I haven't seen any post for counts to FPKM. I have RNA-Seq data which is paired-end reads. Extracted the counts using featureCounts for all the samples. There is a function to convert counts…
beginner
  • 631
  • 7
  • 15
3
votes
1 answer

Searching for start and stop codons for protein sequencing of contigs

I need to convert contigs into their respective protein sequences given a reference genome (i.e. I need to take a substring, whose position is already known on the string, and I need to locate the nearest start and stop codons). This is tricky…
3
votes
0 answers

How to assign the best gap penalty and gap extension penalty using BLOSUM65

For an assignment I must do a pairwise optimal local alignment using BLOSUM65 and five protein sequences. The algorithm I want to use is the Smith-Waterman. Context protein sequencing using Blastp: all the sequences are the hemoglobulin subunit a or…
3
votes
1 answer

Tool to remove a PCR contamination in NGS data

BACKGROUND I work on NGS data (illumina paired ends reads) coming from a full extract of RNA (metagenomic). We are interested in the viral fraction of this extract. I observed a contamination with a PCR amplicon. This contamination has been…
Untitpoi
  • 131
  • 3
3
votes
1 answer

Histosketch vs Count-Min sketch: preserving similarity

A preprint describing a new tool and its application to microbiome analysis was recently published in bioRxiv[1]. At the core of this new tool, HULK, is a new data structure called a histosketch[2] which is similar in spirit to CountMin sketches and…
Daniel Standage
  • 5,080
  • 15
  • 50
3
votes
1 answer

How can the Autocovariances, autocorrelations, and autocorrelation coefficients be calculated from a Protein Amino Acid Sequence?

Given a normal protein sequence with the 20 standard amino acids, how can the 'Autocovariances', 'autocorrelations', and 'autocorrelation coefficients' of the sequence be calculated? What is meant by these terms in the context of a protein…
Aalawlx
  • 517
  • 4
  • 12
3
votes
1 answer

Understanding SNP coding for association analysis

I'm working on a project about detecting SNP association with a disease. As I understand, SNP is a single variation of the nucleotide that occurs for more than 1% of the population. So, if a gene is ATTG and there's a variation ATAG then the SNP is…
PP mistery
  • 33
  • 3
3
votes
1 answer

How to query the Human Microbiome Project (HMP) to find all subjects with both 16s and WGS workups?

I am looking for a query to run on the HMP database that will return all subjects who have had BOTH 16s and whole genomes sequence (WGS) workups. I am currently using this query... file.matrix_type = 16s_community OR file.matrix_type =…
ljs
  • 265
  • 1
  • 5
3
votes
1 answer

Random addition method for phylogenetic tree reconstruction

I have been working on nucleotide and protein data to reconstruct maximum parsimony trees using MEGA software to get parsimony tree, via "random addition" method is used to generate the initial tree(s). I need to know that when randomly chosen…
Sidra Younas
  • 503
  • 2
  • 13
3
votes
3 answers

How to get the product of a CDS

I need the name of the protein in /product="protein_name" using bash commands. Beware, there is a lot of whitespace between lines. FEATURES Location/Qualifiers source 1..1266 /organism="Sarcophilus…
3
votes
1 answer

using SNPs to identify mixed samples

Is there a way to identify mixed samples based on SNPs? Example input (table of genotypes for multiple samples): | | sample1 | sample2 | sample3 | |------|---------|---------|---------| | rs1 | AA | BB | AB | | rs2 | AA |…
burger
  • 2,179
  • 10
  • 21
3
votes
1 answer

Aligning sequence and comparing it against primer

I am looking to show how a primer is consistent among some genomic data. I have a primer of about 23bp and looking to compare it to about 5000 genomic sequences of 10kb. I am unsure how to format it the way I need to, since I am just trying to show…
Colin
  • 33
  • 4
3
votes
3 answers

Finding transposable elements using RepeatMasker

I'm using RepeatMasker to detect, classify the Transposable elements. My Input is a eukaryotic non-reference genome. I made a run via RepeatMasker many times to Mask the TEs, but return 0 Annotation tables. Further, I used a different -species…
BioInfo
  • 374
  • 3
  • 13
3
votes
1 answer

How to iterate protein sequences using amino acids?

I was working on certain program using python and I have been using a protein aligned sequence file in two formats, phylip (.phy) and clustal (.aln). Example clustal file: CLUSTAL 2.1 multiple sequence alignment Homo …
Sidra Younas
  • 503
  • 2
  • 13
3
votes
1 answer

Get genomic coordinates using GenomicFeatures by HGNC gene names

I want to get coordinates of human genes from my list (consisting of hgnc genes id) using GenomicFeatures and TxDb.Hsapiens.UCSC.hg19.knownGene R packages from…
lizaveta
  • 203
  • 1
  • 3