Most Popular

1500 questions
4
votes
1 answer

Find intervals that genes fall within their range

I am trying to run some frequency based stats to identify selective sweep in my study system using Rpackage "PopGenome". To proceed with it I have split my genome data (whole genome sequencing) into 5MB chunks cregions <- character(31) cregions[1]…
Anna1364
  • 516
  • 2
  • 8
4
votes
1 answer

Generating DNA sequences with constraints

I would like some advice on potential strategies to address the following problem. I want to write a program that will generate DNA sequences that are optimized on two constraints based on an input genome: The GC content should be as close to the…
cmdoret
  • 595
  • 2
  • 10
4
votes
1 answer

How are PDO and PDX used in computational and predicative models for tumour biology?

There are wet methods: Patient-derived models: Patient Derived Xenograft (PDX) and Patient Derived Organoids (PDO) to reflect tumor biology. Are there any databases or computational tools that use the outcomes from PDO/PDX experiments to create a…
0x90
  • 1,437
  • 9
  • 18
4
votes
2 answers

Run Nextflow with file dependencies inside a Docker container?

I’m trying to run a Nextflow workflow in a custom Docker container. Without Docker, the workflow succeeds. But running it inside the container leads to an error because a dependent file cannot be found. Here’s a minimal script to illustrate the…
Konrad Rudolph
  • 4,845
  • 14
  • 45
4
votes
1 answer

Remove variants that do not map to human genome

[This question was also asked on Biostars] I received an hg38 VCF file that's had variants imputed with 1000 genomes. I've encountered some issues with the VCF; REF alleles that do not align to a reference genome, ALT alleles that do not appear to…
John Rouhana
  • 151
  • 3
4
votes
2 answers

Entrez.efetch returns incomplete genbank records

I am using the biopython Entrez.efetch command to retrieve all features (CDS, mRNA, ...) of some genomes. In this case (NC_014649, Acanthamoeba polyphaga mimivirus), it works as expected: from Bio import Entrez, SeqIO handle =…
cmdoret
  • 595
  • 2
  • 10
4
votes
2 answers

How is BLAST's nr database created?

Is there a paper or web page describing the procedure for creating the nr database used by NCBI's BLAST implementation? I presume it's some type of clustering, but I'm curious about how exactly sequences are condensed into non-redundant…
juniper-
  • 900
  • 6
  • 13
4
votes
1 answer

Correlating gene expression with qualitative variables

I have a gene expression dataset that I want to investigate. Particularly, I would like to understand whether there is any correlation between each gene's expression and some quantitative or qualtitative data (say, correlation between gene 'XPTO' ,…
Sos
  • 141
  • 3
4
votes
1 answer

Find gene at position from gff or gbk file

I have a VCF file with SNPs from a bacterial genome and want to find if the SNPs are located inside genes, is there some CLI-tool where you can pass a VCF file and a gff or gbk file and it returns the name of the genes?
haegglund
  • 91
  • 5
4
votes
2 answers

How to dump genes from GenBank in GFF3 format?

This question has also been asked on BioStars If I look at this record in GenBank I see about 6k genes: https://www.ncbi.nlm.nih.gov/nuccore/CM000760?report=gbwithparts I'd really like to be able to dump those genes in GFF3 format, but I'm guessing…
Dan
  • 612
  • 3
  • 12
4
votes
1 answer

Authoritative source on human cytogenetic regions?

I am looking for a database that would keep track of human cytogenetic regions and genomic coordinates per genome assembly. I had expected the Genome Reference Consortium to have it, or Ensembl, or NCBI, but I could not find it. Is there a place…
Ramiro Magno
  • 165
  • 1
  • 7
4
votes
1 answer

How to represent trans-spliced genes in GTF?

For example, see this gene (nad1) in ENA: http://www.ebi.ac.uk/ena/data/view/ABI60879 If you look at the XML for that gene you see the following: join( DQ984518.1: 324706 .. 325091 , complement(DQ984518.1: 24417 .. 24498), …
Dan
  • 612
  • 3
  • 12
4
votes
2 answers

After artificially creating events in a FASTA file, how do I keep track of the old coordinates?

I'm beginning with the reference genome in FASTA format, hg19. I am reading the sequence into a Python dictionary with BioPython: genome_dictionary = {} for seq_record.id in SeqIO.parse(input_fasta_file, "fasta"): …
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
4
votes
3 answers

What command/invocation is used to generate NCBI 16SMicrobial blastdb

I'm looking for the exact invocation used to generate the 16SMicrobial database that you can download from here: https://ftp.ncbi.nlm.nih.gov/blast/db/ I'm hoping to create the same type of blastdb with the same type of metadata with custom…
amblina
  • 332
  • 2
  • 10
4
votes
4 answers

Analysis of differential transcript usage (DTU)

Recent breakthroughs in bioinformatics tools for quantification (e.g. Cufflinks/Kallisto/Salmon etc.) and tools which can identify differential transcript usage (DTU) (e.g. DRIMSeq, Cufflinks etc.) mean that from RNA-seq data we can now relatively…