Most Popular
1500 questions
4
votes
1 answer
Find intervals that genes fall within their range
I am trying to run some frequency based stats to identify selective sweep in my study system using Rpackage "PopGenome".
To proceed with it I have split my genome data (whole genome sequencing) into 5MB chunks
cregions <- character(31)
cregions[1]…
Anna1364
- 516
- 2
- 8
4
votes
1 answer
Generating DNA sequences with constraints
I would like some advice on potential strategies to address the following problem.
I want to write a program that will generate DNA sequences that are optimized on two constraints based on an input genome:
The GC content should be as close to the…
cmdoret
- 595
- 2
- 10
4
votes
1 answer
How are PDO and PDX used in computational and predicative models for tumour biology?
There are wet methods: Patient-derived models: Patient Derived Xenograft (PDX) and Patient Derived Organoids (PDO) to reflect tumor biology.
Are there any databases or computational tools that use the outcomes from PDO/PDX experiments to create a…
0x90
- 1,437
- 9
- 18
4
votes
2 answers
Run Nextflow with file dependencies inside a Docker container?
I’m trying to run a Nextflow workflow in a custom Docker container. Without Docker, the workflow succeeds. But running it inside the container leads to an error because a dependent file cannot be found.
Here’s a minimal script to illustrate the…
Konrad Rudolph
- 4,845
- 14
- 45
4
votes
1 answer
Remove variants that do not map to human genome
[This question was also asked on Biostars]
I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to…
John Rouhana
- 151
- 3
4
votes
2 answers
Entrez.efetch returns incomplete genbank records
I am using the biopython Entrez.efetch command to retrieve all features (CDS, mRNA, ...) of some genomes.
In this case (NC_014649, Acanthamoeba polyphaga mimivirus), it works as expected:
from Bio import Entrez, SeqIO
handle =…
cmdoret
- 595
- 2
- 10
4
votes
2 answers
How is BLAST's nr database created?
Is there a paper or web page describing the procedure for creating the nr database used by NCBI's BLAST implementation?
I presume it's some type of clustering, but I'm curious about how exactly sequences are condensed into non-redundant…
juniper-
- 900
- 6
- 13
4
votes
1 answer
Correlating gene expression with qualitative variables
I have a gene expression dataset that I want to investigate. Particularly, I would like to understand whether there is any correlation between each gene's expression and some quantitative or qualtitative data (say, correlation between gene 'XPTO' ,…
Sos
- 141
- 3
4
votes
1 answer
Find gene at position from gff or gbk file
I have a VCF file with SNPs from a bacterial genome and want to find if the SNPs are located inside genes, is there some CLI-tool where you can pass a VCF file and a gff or gbk file and it returns the name of the genes?
haegglund
- 91
- 5
4
votes
2 answers
How to dump genes from GenBank in GFF3 format?
This question has also been asked on BioStars
If I look at this record in GenBank I see about 6k genes:
https://www.ncbi.nlm.nih.gov/nuccore/CM000760?report=gbwithparts
I'd really like to be able to dump those genes in GFF3 format, but I'm guessing…
Dan
- 612
- 3
- 12
4
votes
1 answer
Authoritative source on human cytogenetic regions?
I am looking for a database that would keep track of human cytogenetic regions and genomic coordinates per genome assembly. I had expected the Genome Reference Consortium to have it, or Ensembl, or NCBI, but I could not find it.
Is there a place…
Ramiro Magno
- 165
- 1
- 7
4
votes
1 answer
How to represent trans-spliced genes in GTF?
For example, see this gene (nad1) in ENA:
http://www.ebi.ac.uk/ena/data/view/ABI60879
If you look at the XML for that gene you see the following:
join(
DQ984518.1: 324706 .. 325091 ,
complement(DQ984518.1: 24417 .. 24498),
…
Dan
- 612
- 3
- 12
4
votes
2 answers
After artificially creating events in a FASTA file, how do I keep track of the old coordinates?
I'm beginning with the reference genome in FASTA format, hg19. I am reading the sequence into a Python dictionary with BioPython:
genome_dictionary = {}
for seq_record.id in SeqIO.parse(input_fasta_file, "fasta"):
…
ShanZhengYang
- 1,691
- 1
- 14
- 20
4
votes
3 answers
What command/invocation is used to generate NCBI 16SMicrobial blastdb
I'm looking for the exact invocation used to generate the 16SMicrobial database that you can download from here:
https://ftp.ncbi.nlm.nih.gov/blast/db/
I'm hoping to create the same type of blastdb with the same type of metadata with custom…
amblina
- 332
- 2
- 10
4
votes
4 answers
Analysis of differential transcript usage (DTU)
Recent breakthroughs in bioinformatics tools for quantification (e.g.
Cufflinks/Kallisto/Salmon etc.) and tools which can identify differential transcript usage (DTU) (e.g. DRIMSeq, Cufflinks etc.) mean that from RNA-seq data we can now relatively…
Kristoffer Vitting-Seerup
- 374
- 1
- 8