Most Popular
1500 questions
3
votes
1 answer
Why do BLASTn and prokka not seem to be searching the whole fasta file?
When I use blastn and prokka (I will detail exactly how I did so below) on a 2.8 million bp fasta file I get output start/end numbers that do not seem to cover the entire genome.
Starting with a .fna genome such as genome.fna I ...
1 -…
Daniel Harris
- 303
- 2
- 7
3
votes
4 answers
Purpose of ### (3 consecutive pound signs / hashtags / octothorps) in GFF3
I downloaded the annotation of the C. elegans genome in GFF3 format from Ensembl.
I typed the following command, hoping to get the header of the file (lines starting with #).
grep '^#' Caenorhabditis_elegans.WBcel235.95.gff3
This returned 46798…
Biomagician
- 2,459
- 16
- 30
3
votes
1 answer
Determining Read Groups
Which Read Groups are correct:
java -Xmx4G -jar picard.jar AddOrReplaceReadGroups \
I=$SNIC_TMP/WCRO84_S23_L005.bam \
O=WCRO84_S23_L005.bam \
RGID=WCRO84_S23_L005 \
RGLB=WCRO84 \
RGPL=illumina \
RGPU=WCRO84_S23_L005 \
RGSM=WCRO84
or
java -Xmx4G…
user977828
- 453
- 3
- 9
3
votes
3 answers
Which gene I should select from this qqplot
I have a qqplot of my whole genome sequencing data; A plot is for showing possibly significant driver genes. I tried to read about qqplot though but people only say about the skewedness while I want to know from these two genes which one are more…
Zizogolu
- 2,148
- 11
- 44
3
votes
2 answers
Selecting 65000 SNPs where AF is close to 0.5 in all or most populations
I am evaluating the tool somalier (https://github.com/brentp/somalier) and I need to create a list of about 65,000 SNPs where the allele frequency (AF) is as close to 0.5 as possible across the most representative set of populations possible with…
719016
- 2,324
- 13
- 19
3
votes
2 answers
Python - Finding a motif - input: a txt file with 10 sequences and 10 motifs
When I run my BruteForce function with only one input it works and the result is correct.
def BruteForce(s, t):
occurrences = []
for i in range(len(s)-len(t)+1): # loop over alignment
match = True
for j in range(len(t)): #…
Beeta
- 31
- 1
- 3
3
votes
1 answer
Percentage distribution of cells in all clusters based on their treatment condition?
I have 2151 cells, I clustered them by Seurat to 5 clusters. With the code below, I am able to have the number of cells per cluster and per condition:
number_perCluster<- table(object@meta.data$conditions,
…
Charles
- 537
- 6
- 21
3
votes
2 answers
wget for links inside html pages
I am trying to download a file from the following repository: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR7276474
As you can see, there are several layers to the webpage. For example, clicking on the download tab doesn't change the URL and,…
h3ab74
- 836
- 5
- 14
3
votes
2 answers
Hardware Requirements (specs) for Bioinformatics-dedicated desktop
I know this is a somewhat general or vague question, but I’m interested in your opinions. I must build a desktop for general bioinformatics activities with human genomes. I will work with Python and R libraries, I will also need to install denovo…
Lou_A
- 361
- 1
- 4
- 11
3
votes
1 answer
Seeking explanation of the hg38 files downloaded from bowtie 2 website
I downloaded the H. sapiens, NCBI GRCh38 files from Bowtie's website. After unzipping, there are 6 files, 4 that end in set 1.ebwt, set2.ebwt, set3.ebwt, and set4.ebwt and two that end in set.rev.1.ebwt and set.rev.2.ebwt.
I am unable to find any…
user4050
- 31
- 1
3
votes
1 answer
Python string editing in Snakemake
I'm running a custom perl script using Snakemake. I use this rule:
rule complexity_20mer_counter:
input:
os.path.join(fastq_trimmed_dir, '{sample}_{read}' + fastq_extension.split('.')[0] + '_val_' + '{read}'.strip('R') + '.fq.gz')
…
Freek
- 563
- 4
- 11
3
votes
1 answer
Restricting match output for multiple sequence alignment using mafft?
So I aligned roughly 5k sequences and I got my output using mafft. However, I want to restrict the output to only present conserved segments of a specific length (let's say 25bp). Does mafft have parameters to set this that I seem to be missing?
user4035
- 31
- 1
3
votes
1 answer
Output from vcftools missingness
I'm new to data filtering on vcf data and vcftools.
I performed variant calling on my dataset, CHR22, homo sapiens. I'd like to remove sites that are missing in more than 5% of individuals.
vcftools --missing-site --vcf updated_ids68.vcf
This…
Death Metal
- 265
- 1
- 7
3
votes
1 answer
What does liability mean in GWAS heritability?
I am reading about GWAS in heritability. They usually say to calculate heritability on a liability scale but after searching for this word "liability", I still don't understand clearly what does liability mean? Does anyone have any…
user2842390
- 55
- 5
3
votes
4 answers
Can a gff file be converted to a fasta file?
I downloaded an annotated genome file in gff format here. I would like to use it for proteomics. Though I need it in fasta format. Is there any tool that converts gff to annotated fasta? I see the file contains the genome sequence in fasta format,…
Soerendip
- 1,295
- 11
- 22