Most Popular

1500 questions
4
votes
2 answers

Error rate setting in Canu error correction

I want to use Canu to correct my nanopore long read (version: MinION R9.5), but I am not quite sure how to set the correctErrorRate. Should I follow the Canu manual (Nanopore R7 2D and Nanopore R9 1D Increase the maximum allowed difference in…
raymond
  • 41
  • 1
4
votes
1 answer

Identity plus Similarity Interpretation

I am helping a colleague to interpret protein multiple sequence alignment results. My colleague used a third-party proprietary solution (I know...) to calculate the alignment, and one of the resulting tables is a pairwise distance matrix with…
4
votes
2 answers

Is there a way to quickly verify the presence of some SNPs in Fastq files?

Is there a tool that can scan fastq files without assembling them for a custom list of user defined snps?
Dan K
  • 89
  • 4
4
votes
1 answer

Analyzing Illumina Counts

I'm pretty new to all of this--forgive me if this is a simple question. When I download illumina counts from GEO (like the supplementary file in GSE89225). Can I do comparisons directly on that file? Is there some normalization procedure I should…
julianstanley
  • 401
  • 3
  • 9
4
votes
1 answer

How to confirm exon shuffling in a gene?

Note: this question has also been asked on reddit I'm trying to confirm that the sequence of a novel gene is derived by exon shuffling between several different genes. I have the promoter sequence, gene sequence, and mRNA (with defined exon/intro…
mj2000
  • 41
  • 1
4
votes
1 answer

Using signal peptide and the expression levels of signal recognition particle in secretome analysis

I have not found any work which investigates assessment of differences in levels of secreted proteins by taking advantage of differential expression of the genes which mediate the secretory pathway. For example, suppose the gene whose product binds…
jaslibra
  • 524
  • 2
  • 9
4
votes
1 answer

How to filter intervals (reads or genomic coordinates) that have the exact same 5' or 3' ends?

Say I have reads that overlap some genes that produce small RNAs, but I want only those reads that start at exactly the TSS of the loci. In other words, reads whose 5' end match the 5' end of a genomic feature. 5'....3' ...+++++++...…
4
votes
1 answer

Given a Genomic Ranges of SNPs, how to inject these SNPs in genome via BSGenome?

Let's say I have the genome hg19 loaded into R via BSGenome library("BSgenome") hg19genome = getBSgenome('BSgenome.Hsapiens.UCSC.hg19', masked=FALSE) I then have a list of SNPs loaded as a GRanges object, gr library(GenomicRanges) > gr …
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
4
votes
1 answer

Software recommendations - DNA composition

I'm looking to generate data based on the DNA composition of a region of my genomes (data is incomplete genomes from HiSeq runs in fasta format). I'm looking for software which will give me sliding windows for GC content, GC skew, codon bias etc. I…
AudileF
  • 955
  • 8
  • 25
4
votes
1 answer

Cannot install chromosomer

I am trying to install chromosomer but I fail. Can anybody help me, please? $ pip install chromosomer Collecting chromosomer Could not find a version that satisfies the requirement chromosomer (from versions: ) No matching distribution found for…
Biomagician
  • 2,459
  • 16
  • 30
4
votes
1 answer

Get canonical transcript from UCSC

I am using the following command to get all refseq genes from UCSC: /usr/bin/mysql --user=genomep --password=password --host=genome-mysql.cse.ucsc.edu \ -A -D hg38 -e 'select concat(t.name, ".", i.version) name, \ k.locusLinkId as…
terdon
  • 10,071
  • 5
  • 22
  • 48
4
votes
1 answer

SARS-CoV-2 sequence used in the AlphaSeq Antibody Datasets to predict binding affinity

Currently we are building a sequence based deep-learning model to predict binding affinity between antibody and antigen. For this we are training a sequence based model with AlphaSeq Antibody dataset…
Krishna
  • 43
  • 3
4
votes
1 answer

split fastq file containing a sequence block at different locations

I have some fastq files (obtained from nanopore sequencing) that contain reads that can be of either of these 5 forms: a known CDS with 3'UTR: CDS----------------------- (seq1: original sequence) the same cds with a block of 50bp represented by…
4
votes
1 answer

What does a question mark ("?") mean in Picard metrics files when I expect a number (integer, float, etc.)

Does anyone know what conditions must be true for Picard to put a ? where I expect a number? Here is an example from the output of the tool CollectWgsMetrics: ❯ cat sample.wgs_metrics.txt | head -n8 | tail -n2 | verticalize | grep…
clintval
  • 143
  • 4
4
votes
1 answer

Best way to detect long insertions in bisulfite sequencing data?

I am interested in identifying indels in whole genome bisulfite sequencing data (76bp paired end). Currently, I do this by setting the -rfg and -rdg affine gap penalty scores for bowtie2 to more permissive values than the default 5+3N and mapping…
Ben D.
  • 397
  • 1
  • 10