Most Popular

1500 questions
3
votes
1 answer

Why do BLASTn and prokka not seem to be searching the whole fasta file?

When I use blastn and prokka (I will detail exactly how I did so below) on a 2.8 million bp fasta file I get output start/end numbers that do not seem to cover the entire genome. Starting with a .fna genome such as genome.fna I ... 1 -…
Daniel Harris
  • 303
  • 2
  • 7
3
votes
4 answers

Purpose of ### (3 consecutive pound signs / hashtags / octothorps) in GFF3

I downloaded the annotation of the C. elegans genome in GFF3 format from Ensembl. I typed the following command, hoping to get the header of the file (lines starting with #). grep '^#' Caenorhabditis_elegans.WBcel235.95.gff3 This returned 46798…
Biomagician
  • 2,459
  • 16
  • 30
3
votes
1 answer

Determining Read Groups

Which Read Groups are correct: java -Xmx4G -jar picard.jar AddOrReplaceReadGroups \ I=$SNIC_TMP/WCRO84_S23_L005.bam \ O=WCRO84_S23_L005.bam \ RGID=WCRO84_S23_L005 \ RGLB=WCRO84 \ RGPL=illumina \ RGPU=WCRO84_S23_L005 \ RGSM=WCRO84 or java -Xmx4G…
user977828
  • 453
  • 3
  • 9
3
votes
3 answers

Which gene I should select from this qqplot

I have a qqplot of my whole genome sequencing data; A plot is for showing possibly significant driver genes. I tried to read about qqplot though but people only say about the skewedness while I want to know from these two genes which one are more…
Zizogolu
  • 2,148
  • 11
  • 44
3
votes
2 answers

Selecting 65000 SNPs where AF is close to 0.5 in all or most populations

I am evaluating the tool somalier (https://github.com/brentp/somalier) and I need to create a list of about 65,000 SNPs where the allele frequency (AF) is as close to 0.5 as possible across the most representative set of populations possible with…
719016
  • 2,324
  • 13
  • 19
3
votes
2 answers

Python - Finding a motif - input: a txt file with 10 sequences and 10 motifs

When I run my BruteForce function with only one input it works and the result is correct. def BruteForce(s, t): occurrences = [] for i in range(len(s)-len(t)+1): # loop over alignment match = True for j in range(len(t)): #…
Beeta
  • 31
  • 1
  • 3
3
votes
1 answer

Percentage distribution of cells in all clusters based on their treatment condition?

I have 2151 cells, I clustered them by Seurat to 5 clusters. With the code below, I am able to have the number of cells per cluster and per condition: number_perCluster<- table(object@meta.data$conditions, …
Charles
  • 537
  • 6
  • 21
3
votes
2 answers

wget for links inside html pages

I am trying to download a file from the following repository: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR7276474 As you can see, there are several layers to the webpage. For example, clicking on the download tab doesn't change the URL and,…
h3ab74
  • 836
  • 5
  • 14
3
votes
2 answers

Hardware Requirements (specs) for Bioinformatics-dedicated desktop

I know this is a somewhat general or vague question, but I’m interested in your opinions. I must build a desktop for general bioinformatics activities with human genomes. I will work with Python and R libraries, I will also need to install denovo…
Lou_A
  • 361
  • 1
  • 4
  • 11
3
votes
1 answer

Seeking explanation of the hg38 files downloaded from bowtie 2 website

I downloaded the H. sapiens, NCBI GRCh38 files from Bowtie's website. After unzipping, there are 6 files, 4 that end in set 1.ebwt, set2.ebwt, set3.ebwt, and set4.ebwt and two that end in set.rev.1.ebwt and set.rev.2.ebwt. I am unable to find any…
user4050
  • 31
  • 1
3
votes
1 answer

Python string editing in Snakemake

I'm running a custom perl script using Snakemake. I use this rule: rule complexity_20mer_counter: input: os.path.join(fastq_trimmed_dir, '{sample}_{read}' + fastq_extension.split('.')[0] + '_val_' + '{read}'.strip('R') + '.fq.gz') …
Freek
  • 563
  • 4
  • 11
3
votes
1 answer

Restricting match output for multiple sequence alignment using mafft?

So I aligned roughly 5k sequences and I got my output using mafft. However, I want to restrict the output to only present conserved segments of a specific length (let's say 25bp). Does mafft have parameters to set this that I seem to be missing?
user4035
  • 31
  • 1
3
votes
1 answer

Output from vcftools missingness

I'm new to data filtering on vcf data and vcftools. I performed variant calling on my dataset, CHR22, homo sapiens. I'd like to remove sites that are missing in more than 5% of individuals. vcftools --missing-site --vcf updated_ids68.vcf This…
Death Metal
  • 265
  • 1
  • 7
3
votes
1 answer

What does liability mean in GWAS heritability?

I am reading about GWAS in heritability. They usually say to calculate heritability on a liability scale but after searching for this word "liability", I still don't understand clearly what does liability mean? Does anyone have any…
3
votes
4 answers

Can a gff file be converted to a fasta file?

I downloaded an annotated genome file in gff format here. I would like to use it for proteomics. Though I need it in fasta format. Is there any tool that converts gff to annotated fasta? I see the file contains the genome sequence in fasta format,…
Soerendip
  • 1,295
  • 11
  • 22