Most Popular

1500 questions
4
votes
1 answer

Telling grep to treat N as [ATCG]

Okay so I'm using grep to try and get a preview of some trimming operations that are not going as expected.. Lets say that my sequence in the FastQ file is: ATNGCNATCG What I want to do is.. grep "ATCGCTATCG" my.fastq ..and match the sequence given…
RPINerd
  • 51
  • 3
4
votes
2 answers

Is there a good practice (or an easy one for a SBOL newbie) to set up a collection of biological parts and devices?

I have a collection of biological parts and devices in a particular format that basically stores information of the sequence of the elements. My intention is to move to a more standardized data structure with a particular interest in Synthetic…
4
votes
0 answers

Do RepeatModeler results contain functional domains?

The repeat families predicted by RepeatModeler contain known transposable elements (TEs) and unknown ones. How do we know whether some of these may actually: within a functional domain of a gene or represent a functional domain? Could it be…
4
votes
1 answer

Allele Count and Allele Frequency in VCF files

I'm working in bioinformatics, but my computational skills far outstrip my knowledge of biology or genomics. So forgive the noobish question. According to the VCF specification, the INFO column for each coordinate [CHROM, POS] can contain values…
abalter
  • 161
  • 1
  • 6
4
votes
1 answer

Find pattern that is present twice and allow <=2 mismatches on each

I have a fastq file of 400,000 reads (so speed is important). In the sequences there are barcodes integrated that should be present twice. Given a barcode, I want to find the sequences that have the barcode present twice with <= 2 mismatches. So,…
nafizh
  • 69
  • 2
4
votes
2 answers

What is the typical host-to-bug DNA ratio found in nanopore sequencing without amplification?

I'm interested in sequencing a human sputum sample using an ONT MinION without performing any type of whole genome DNA amplification or targeted PCR. Has anyone found a good reference (or anecdotal evidence) for a range of human to pathogen DNA…
TimD1
  • 302
  • 1
  • 8
4
votes
1 answer

Are there computational tools to extract features of DNA sequences?

I am looking for tools to extract features from short DNA sequences. For example, entropy, complexity, GC-content, etc. I have found the generateFeatures.py script from the PyFeat repo, but is there love a more widely used source code or a standard…
0x90
  • 1,437
  • 9
  • 18
4
votes
2 answers

How can I use annotations to remove variants not relevant to cancer risk?

I currently have ~180 whole germlines and around 10M SNPs/indels. I would like to build a predictive model using Machine Learning (ML) techniques to predict cancer risk according to these germline variants. The thing is, most of these 10M variants…
Ezequiel
  • 67
  • 4
4
votes
2 answers

Sort reads in BAM file based on presence of specific deletion?

I have an indexed BAM file containing long-read sequencing data and I'd like to split the reads contained within into those with a known deletion and those without the deletion (I have the deletion coordinates available to me) when mapped against…
jazzbo
  • 43
  • 5
4
votes
1 answer

Difference between genome assembly and genome sequence alignment to a reference to find structural variants

I'm trying to determine what the difference and benefits of genome assembly and genome sequence alignments are when trying to identify structural variants or transposons in populations. I've been scouring the internet but have only really come…
M4r1n4
  • 41
  • 1
4
votes
1 answer

Samtools Index: Chromosome Blocks not Continuous

I am working with short-read whole-genome sequences from the NCBI's SRA. I have aligned and sorted all of my short-read sequences and am attempting to index each sequence into .bai format using samtools index, but am running into a couple of…
annabelperry
  • 199
  • 1
  • 9
4
votes
0 answers

Kraken2 > OTU format > Phyloseq

A collaborator has passed me over Kraken2 outputs *.report and *.kraken, from a metatranscriptomic sequencing experiment conducted on the minION. I would like to make a tree if the data using a standard phylogenetics package such as phyloseq,…
Reebola95
  • 41
  • 1
4
votes
1 answer

How to calculate module-trait relationship when trait data is in binary format?

I have a dataset of 50 breast cancer samples. These samples are classified into four subtypes Lum A, Lum B, Her2 and Basal. I have been working with lncRNAs and protein-coding genes. To identify the functions of lncRNAs, I have used WGCNA through…
user9114
  • 43
  • 4
4
votes
1 answer

How to design DESeq2 LRT model with individuals nested in 2 levels?

We have a complicated experimental design that we would like to perform LRT analysis for. Our main goal is to discover significant genes for the "Injection:Social" interaction term across the entire dataset by removing it from the LRT reduced model,…
jfaberha
  • 45
  • 4
4
votes
1 answer

Where can I find lncRNA expression data for different cell types?

Are there any publicly available databases providing expression data for long non-coding RNAs (lncRNAs) across cell types of multicellular organisms? Alternatively, are there lesser known UCSC tracks for this? Example 1: I want to compare…
Gawain
  • 315
  • 1
  • 10