Most Popular

1500 questions
4
votes
1 answer

How to solve Nextflow error: "Trace file already exists"?

When trying to run epi2me-labs/wf-artic, I get the following error: ❯ nextflow run epi2me-labs/wf-artic \ -r v0.3.18 --fastq ~/Downloads/barcode95.fastq.gz \ --scheme_version Midnight-ONT/V3 N E X T F L O W ~ version 22.10.1 Launching…
Cornelius Roemer
  • 409
  • 2
  • 16
4
votes
1 answer

Why do we need to find minimum energy in a protein chain?

High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models The above research paper is about a software tool for reconstructing a protein's main chain model only from its alpha-carbon backbone (aka, C-alpha…
user366312
  • 654
  • 2
  • 14
4
votes
1 answer

How to promote assemblies into genomes in NCBI?

Note: I've never submitted an assembly/genome to NCBI, so excuse if my perspective is flawed. I'm working with Drosophila subobscura. (spring fruit fly) I see here https://www.ncbi.nlm.nih.gov/data-hub/genome/?taxon=7241 that there are at least 2…
gl00ten
  • 249
  • 1
  • 5
4
votes
0 answers

Write a bash script to run gatk, fix errors with input, and rerun until completion

I have a bam file that I want to run through GATK's SplitNCigarReads tool. Because of the way the bam file was generated, the program will often fail, with an error message stating: ##### ERROR MESSAGE: Bad input: Cannot split this read (might be…
kylep
  • 41
  • 2
4
votes
1 answer

How does one distinguish nuclear DNA from mitochondrial DNA when doing WGS?

I'm interested in doing de-novo sequencing but also phylogenetic analysis. In particular, after de-novo sequencing and annotating the genome, I need to align the CO1 gene and the nuclear 28S rRNA gene of several species. When extracting DNA and…
Caterina
  • 307
  • 1
  • 5
4
votes
1 answer

Find SNPs in yeast genomes

I'm a new Bioinformatic scientist working for a yeast genetics company. Objective To create a database of yeast genomes from NCBI and identify SNP variants. In my pipeline FastQC, Trimmomatic, BWA GATK The method being to check the quality of the…
rimo
  • 963
  • 1
  • 15
4
votes
2 answers

Downloading genomic protein files from accessions in Python

I am trying to download the _protein.faa.gz files for genomes given their accession numbers through Python. Ideally, I would like to do this without third party libraries. Essentially what I have is a list of just the GCA or GCF accessions. The…
Grimey
  • 43
  • 3
4
votes
1 answer

Which atoms are not found in protein PDB files?

I am developing an educational Bioinformatics framework, I need to know Which atoms are absolutely not found in any PDB files? in the following list - Atomic radii of the elements (data page)
user366312
  • 654
  • 2
  • 14
4
votes
0 answers

Retrieve protein sequence from Mgnify given only accession code

I only have the accession codes of several proteins from the MGnify database (https://www.ebi.ac.uk/metagenomics/). I would like to retrieve the full amino acid sequence data from the database, but I have not been able to find a way to do this using…
ProteinGuy
  • 141
  • 2
4
votes
2 answers

Construct phylogeny from a Fasta file

I have a set 189 taste receptor protein sequences (not aligned) in a fasta file. I like to get the phylogenetic tree in newick format. I was using earlier https://ngphylogeny.fr/. Unfortunately now it is not working due to some bugs. Necessarily, I…
4
votes
1 answer

What is the contributing factor in single measurements of nanopore sequencing? One base, a k-mer or difference between the base left and entered?

My question is about nanopore sequencing and specifically about the current that is measured by the device in each measurement. The question is: In each measurement in nanopore sequencing, the change in the current between two sides of the membrane…
Marjan
  • 309
  • 1
  • 6
4
votes
2 answers

Does the number of RNA reads per cell obtained from the 10X scRNA experiment depend on amount of mRNA in given cell?

As we know, the amount of RNA reads per cell obtained from 10X scRNA experiment vary between cells. I wonder if this is effect of technical issues or does the number of RNA reads per cell obtained from the 10X scRNA experiment depend on the amount…
4
votes
1 answer

How to manage memory contraints when analyzing a large number of gene count matrices? I keep running out of RAM with my current pipeline

I have several hundred scRNA-seq count matrices, each from a different sample. For my other dataset containg a few dozen samples, I simply merged everything together into one Seurat object, but that won’t work here as far as I can tell. When I try…
4
votes
2 answers

Error in seaborn plot " Horizontal orientation requires numeric `x` variable"

I am trying to plot a box plot with seaborn with the following code plot = sns.boxplot( y='Min PPI distance', x='Synergy_percent', color='white', orient="h", data=synergy_df ) plot.set( xlabel='Synergy (%)', ylabel='Min.…
Megha
  • 395
  • 1
  • 3
  • 10
4
votes
1 answer

Using QCTOOL v2 to process UK Biobank .bgen files - why so slow?

I’m currently using QCTOOL v2 to process imputed .bgen files from UK Biobank, however they seem to be processing very slowly. Is this normal? My command is pretty basic; I’m filtering out a list of SNPs and samples: /path_to/qctool \ -g…
Cat
  • 61
  • 1