Most Popular
1500 questions
4
votes
1 answer
nextflow: tumor normal sample - how to make code organic
I'm developing pipeline in nextflow it uses tools: fastp, bwa mem index, bwamem, gatk mark duplicates, gatk setupnmd, gatk applybqsr, gatk recalibrate. I do this using FASTQ.gz of normal and tumor DNA samples.
I'm interested to use mutect2, lancet…
Death Metal
- 265
- 1
- 7
4
votes
2 answers
How can I give different names to files in a directory with a for loop in a bash script?
I'm expecting to get 17 different paired-end fastq files (34 in total), so I want to make a bash script to just run my code through all the fastq files in a directory at once. How can I change the name of the input and output files each time the…
Pablo O. García Díaz
- 43
- 4
4
votes
2 answers
Keeping DNA sequence after changing FASTA header on command line
I have a FASTA header that looks like this:
>7c8250ef-c89f-4d42-9d48-12c8fe245fb2 runid=606f271fc97598006ba5a922136a2c304cef75a5 sampleid=Pool12-1 read=19008 ch=301 start_time=2021-07-03T08:48:18Z barcode=barcode01
And I am able to change it to the…
rimo
- 963
- 1
- 15
4
votes
1 answer
Nextflow - Process has already been used
I am trying to write a nextflow script below:
/*
* pipeline input parameters
*/
params.reads = "/path/to/fq/files/"
params.guide_library = "/path/to/guidelibrary/"
params.outdir = "results"
log.info """\
DUAL GUIDE CRISPR QC - N F P I P E L…
user17657
- 41
- 1
4
votes
1 answer
How to get a GISAID account? I registered months ago, still no reply!
Inspired by amateur variant hunters, I would like to join the Pango lineage proposal community and help contribute to variant surveillance.
However, I cannot seem to get access to GISAID, the platform that exclusively hosts a large share of…
AppleBees
- 43
- 4
4
votes
0 answers
Salmon Pseudo count when dealing with male and female RNA-seq data
I've generated a quant seq data that I intend to use to compare male and female gene expresion, with a focus on sexual chromosome.
For my species (three-spined stickleback), it is a classic XY sex determination system, and i have a male and a female…
Florent Sylvestre
- 41
- 2
4
votes
1 answer
What does the "X_OS_IND" column mean?
I want to survival analysis using the subset of TCGA LUAD dataset, which identifier are located here.
> head(LUAD_clinicalMatrix[,c("X_EVENT","X_OS_IND")])
X_EVENT X_OS_IND
1 NA NA
2 0 0
3 1 1
4 0 …
WangShixiang
- 41
- 1
4
votes
3 answers
mapping nucleotide sequences
I am trying to do mapping with multiple sequences against a reference genome. I tried the following code: The outputfile is .fastq.sam. I need only .sam. How do I get it?
for i in test/*.fastq;
do
./minimap2 -ax map-ont reference.fna $i > $i.sam;
…
hgf
- 43
- 3
4
votes
1 answer
How can I make this Biopython program (to correct erroneous barcodes) run faster, and is there any alternative method?
This question has also been asked on Biostars
I am looking forward to getting a valuable suggestion for a bioinformatic problem.
Background:
Currently, I am performing a de novo whole genome assembly. At the stage of barcode correction, I lost…
VIJITH KUMAR
- 41
- 2
4
votes
4 answers
what percentage of the human genome is MAPQ=0?
When doing Illumina 2x150bp sequencing of genomic DNA, and after aligning the reads to GRCh38, what percentage of the non-N fraction of the human genome is MAPQ=0? This is, what part corresponds to regions that can't be uniquely mapped with 2x150bp…
719016
- 2,324
- 13
- 19
4
votes
1 answer
What is the NCBI's definition of an "atypical genome"?
Using the new NCBI Datasets platform, you can browse the collection of genomes associated with one or more taxa. For example, searching Pseudomonas aeruginosa returns 19,878 genomes as of 29 March 2023.
In the search filtering tab, they give the…
acvill
- 613
- 3
- 12
4
votes
2 answers
How can I use Arlequin via the command line?
I've got a decent knowledge of programming (incl. bash scripting) but I fail to understand how Arlequin works. Could you please help me with a very simple reproducible example on how to use Arlequin via the command line?
As the .zip file comes with…
Remi.b
- 203
- 1
- 8
4
votes
1 answer
Identifying somatic mutations in cell lines
I would like to identify the somatic mutations present in a cell line and characterise the genes that are potentially affected by those mutations. For example, are there oncogenes mutated in a subpopulation in the cell line?
I have currently access…
Macintosh
- 160
- 6
4
votes
2 answers
Why does this nextflow script finish after running one sample?
I have 36 samples in total in the bam_files folder, with the name like this 20230306_CH_EP_C01.md.bam
I expect the code to output all the 36 samples, one by one. But the run stopped after only running one sample.
I have searched online, but did not…
cautree
- 139
- 6
4
votes
3 answers
How to retrieve sequences from a txt file by protein names from other txt file?
My problem is that I want to retrieve the sequences line + secondary structure line that match the protein names from the header.txt file:
The text file(header.txt) contains the protein names that look like this:
1XXIJ
1P0EA
1H9HI
And the other…
Amal
- 81
- 3