Most Popular
1500 questions
4
votes
1 answer
calculating mutation frequencies for every gene
I have a dataset for mutation data and I want to calculate mutation frequencies across all genes
df (This is only the small subset of data)
Gene name Sample id MUTATION_ID Mutation Description
ARID1B 2719660 171258500 Substitution -…
Priya
- 351
- 1
- 3
- 8
4
votes
2 answers
how is the DNA Integrity Number (DIN) calculated in Bioanalyzer/TapeStation?
For DNA/RNA quantification machines like the Bioanalyzer or TapeStation, the DNA Integrity Number (DIN) or RNA Integrity Number (RIN) numbers are quoted as a measure of the fragmentation of the material.
How is the DNA Integrity Number (DIN)…
719016
- 2,324
- 13
- 19
4
votes
2 answers
question to pick values based on conditions using a loop
I have a dataset with following information. Column C1_1 is chromosome number, C1_2 is SNP position and C3 is the p-value. I want to pick the most significant association within a genomic region of 3000 bp:
C1 C2 C3
L01_005000 L002g034 …
user1567654
- 53
- 4
4
votes
4 answers
How to get a file with the number of reads for several fastq.gz files?
I have generated several FASTQ files and I would like to know the amount of reads for each of them.
I am planning to run FastQC on the files which I know would give me the number of reads per sample but it would be a lot easier for me if I could get…
pythonbeginner
- 115
- 4
4
votes
2 answers
relation between Illumina sequencing primer and viral sequences
Dealing whith a problematic sequencing run I found this over-represented sequence:
GGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGGCGTCTTCTGCTTG
It is clearly relate to the Illumina sequencing primer as shown by this read structure description (note…
mox
- 333
- 2
- 8
4
votes
1 answer
Trim reads 1kb upstream of sequence
I need a quick way to trim multiple reads in a FASTA file. I need to trim everything that is 1kbp upstream of this sequence AAGAGATGTTCAATCGTTTAAACAAATTCCAAGCTGCTTTAGCTTTGGCCCTTTACTCTCA.
I figure a quick python script might be the way to go but I'm…
rimo
- 963
- 1
- 15
4
votes
1 answer
How to fix the Error in Kruskal Wallis rank sum test for comparing the guides in python? Also to check whether using the right test for my data?
df
gene_name guide_1 guide_1_new Correlation
MMP-1 A A 1
MMP-1 A B 0.426115
MMP-1 A C 0.522499
MMP-1 A D 0.431587
MMP-1 B A 0.426115
MMP-1 B B 1
MMP-1 B C 0.60113
MMP-1 B D 0.534858
MMP-1 C A …
Priya
- 351
- 1
- 3
- 8
4
votes
2 answers
Reordering scaffolds according to a reference without a genetic map
I am trying to reorder scaffolds of a rice species, but no genetic map is available right now. Oryza sativa Japonica is a close relative of this rice species. Mummer was used to do a whole genome alignment, and I am trying to reorder scaffolds…
l0o0
- 325
- 1
- 8
4
votes
2 answers
ERR: error while loading shared libraries: libjulia.so.1: cannot open shared object file [Julia] - Ubuntu 22.04
I'm trying to build a Dockerfile. It has a tool called Atria that utilizes Julia. Since Ubuntu 22.04 does not have Julia package, I had to resort to installing it. The following is the Dockerfile:
FROM ubuntu:22.04
# Install tzdata
RUN apt-get…
pubsurfted
- 383
- 1
- 6
4
votes
2 answers
Searching motifs in sequence and their frequencies
This is a two part question.
I am searching for a motif, and in that search I wanted to also find the total number of sequences in my FASTA file, but the code I wrote is not yielding that please see the attempt below. The second part is when I have…
thole
- 143
- 5
4
votes
1 answer
alternatives to MEDIPS to analyse MeDIP datasets
MEDIPS is an established tool with functions for the quality control and analysis of data derived from immunoprecipitation (IP)-seq samples, like Methylation IP sequencing datasets.
I would like to know if there are any other tools I should consider…
719016
- 2,324
- 13
- 19
4
votes
1 answer
How to run a nextflow process for each file generated by another process separately - Input tuple does not match input set cardinality error
I have a process called FINDTAIL that generates different number of files depending on the input data. Its either 2 files or four files i.e.
1. read_1{,.pr,.sl}.fasta and read_2{,.pr,.sl}.fasta
or
2. read_1.pr.fasta, read_1.sl.fasta,…
pubsurfted
- 383
- 1
- 6
4
votes
1 answer
Kallisto error: index input file could not be opened!
I am utilizing Kallisto in Anaconda/miniconda for RNA sequencing. I have successfully made an index; using said index to analyze RNA sequencing data has yielded an error of:
"Error: index input file could not be opened!"
The implication is that…
Tristan
- 43
- 4
4
votes
1 answer
RNA folding at specific temperature with ViennaRNA in python
I am trying to get dot-bracket-notations of single stranded RNA via the ViennaRNA python package (https://pypi.org/project/ViennaRNA/) at different temperatures. I have read in the docs…
gojih
- 41
- 1
4
votes
1 answer
Breseq error (code 137). Any ideas?
The code I ran was here, so nothing fancy.
breseq -l 110 -o RifR_align -r Big_burk_assembly.fasta RifRNano_nanopore.fastq.gz
This was the output.
NOW PROCESSING Read alignment to reference genome
[system] bowtie2-build -q…
Liam T Sullivan
- 41
- 1