Most Popular

1500 questions
4
votes
1 answer

nextflow: tumor normal sample - how to make code organic

I'm developing pipeline in nextflow it uses tools: fastp, bwa mem index, bwamem, gatk mark duplicates, gatk setupnmd, gatk applybqsr, gatk recalibrate. I do this using FASTQ.gz of normal and tumor DNA samples. I'm interested to use mutect2, lancet…
Death Metal
  • 265
  • 1
  • 7
4
votes
2 answers

How can I give different names to files in a directory with a for loop in a bash script?

I'm expecting to get 17 different paired-end fastq files (34 in total), so I want to make a bash script to just run my code through all the fastq files in a directory at once. How can I change the name of the input and output files each time the…
4
votes
2 answers

Keeping DNA sequence after changing FASTA header on command line

I have a FASTA header that looks like this: >7c8250ef-c89f-4d42-9d48-12c8fe245fb2 runid=606f271fc97598006ba5a922136a2c304cef75a5 sampleid=Pool12-1 read=19008 ch=301 start_time=2021-07-03T08:48:18Z barcode=barcode01 And I am able to change it to the…
rimo
  • 963
  • 1
  • 15
4
votes
1 answer

Nextflow - Process has already been used

I am trying to write a nextflow script below: /* * pipeline input parameters */ params.reads = "/path/to/fq/files/" params.guide_library = "/path/to/guidelibrary/" params.outdir = "results" log.info """\ DUAL GUIDE CRISPR QC - N F P I P E L…
user17657
  • 41
  • 1
4
votes
1 answer

How to get a GISAID account? I registered months ago, still no reply!

Inspired by amateur variant hunters, I would like to join the Pango lineage proposal community and help contribute to variant surveillance. However, I cannot seem to get access to GISAID, the platform that exclusively hosts a large share of…
AppleBees
  • 43
  • 4
4
votes
0 answers

Salmon Pseudo count when dealing with male and female RNA-seq data

I've generated a quant seq data that I intend to use to compare male and female gene expresion, with a focus on sexual chromosome. For my species (three-spined stickleback), it is a classic XY sex determination system, and i have a male and a female…
4
votes
1 answer

What does the "X_OS_IND" column mean?

I want to survival analysis using the subset of TCGA LUAD dataset, which identifier are located here. > head(LUAD_clinicalMatrix[,c("X_EVENT","X_OS_IND")]) X_EVENT X_OS_IND 1 NA NA 2 0 0 3 1 1 4 0 …
4
votes
3 answers

mapping nucleotide sequences

I am trying to do mapping with multiple sequences against a reference genome. I tried the following code: The outputfile is .fastq.sam. I need only .sam. How do I get it? for i in test/*.fastq; do ./minimap2 -ax map-ont reference.fna $i > $i.sam; …
hgf
  • 43
  • 3
4
votes
1 answer

How can I make this Biopython program (to correct erroneous barcodes) run faster, and is there any alternative method?

This question has also been asked on Biostars I am looking forward to getting a valuable suggestion for a bioinformatic problem. Background: Currently, I am performing a de novo whole genome assembly. At the stage of barcode correction, I lost…
4
votes
4 answers

what percentage of the human genome is MAPQ=0?

When doing Illumina 2x150bp sequencing of genomic DNA, and after aligning the reads to GRCh38, what percentage of the non-N fraction of the human genome is MAPQ=0? This is, what part corresponds to regions that can't be uniquely mapped with 2x150bp…
719016
  • 2,324
  • 13
  • 19
4
votes
1 answer

What is the NCBI's definition of an "atypical genome"?

Using the new NCBI Datasets platform, you can browse the collection of genomes associated with one or more taxa. For example, searching Pseudomonas aeruginosa returns 19,878 genomes as of 29 March 2023. In the search filtering tab, they give the…
acvill
  • 613
  • 3
  • 12
4
votes
2 answers

How can I use Arlequin via the command line?

I've got a decent knowledge of programming (incl. bash scripting) but I fail to understand how Arlequin works. Could you please help me with a very simple reproducible example on how to use Arlequin via the command line? As the .zip file comes with…
Remi.b
  • 203
  • 1
  • 8
4
votes
1 answer

Identifying somatic mutations in cell lines

I would like to identify the somatic mutations present in a cell line and characterise the genes that are potentially affected by those mutations. For example, are there oncogenes mutated in a subpopulation in the cell line? I have currently access…
Macintosh
  • 160
  • 6
4
votes
2 answers

Why does this nextflow script finish after running one sample?

I have 36 samples in total in the bam_files folder, with the name like this 20230306_CH_EP_C01.md.bam I expect the code to output all the 36 samples, one by one. But the run stopped after only running one sample. I have searched online, but did not…
cautree
  • 139
  • 6
4
votes
3 answers

How to retrieve sequences from a txt file by protein names from other txt file?

My problem is that I want to retrieve the sequences line + secondary structure line that match the protein names from the header.txt file: The text file(header.txt) contains the protein names that look like this: 1XXIJ 1P0EA 1H9HI And the other…
Amal
  • 81
  • 3