Most Popular
1500 questions
6
votes
3 answers
Do any publicly available databases detail protein structure and functional domains?
I am interested in finding a database that takes a gene or protein name as input (possibly with the option to specify transcript) and gives information about the protein's functional domains in terms of either specific residue ranges, genomic…
Dan
- 63
- 2
6
votes
2 answers
How to safely and efficiently convert subset of bam to fastq?
Question
How can I extract reads from a bam file (produced by bwa-mem) to fastq given a list of reference sequences to filter out?
Potential difficulties
maintaining FR orientation of pair end reads (in bam all the sequences are reference…
Kamil S Jaron
- 5,542
- 2
- 25
- 59
6
votes
2 answers
Same transcript coordinates in gtf file, different transcript ID
I have a gtf file from Ensembl, and I noticed that several "transcript" annotations have the exact same coordinates. See for instance the third and fourth transcripts ("Y74C9A.2b.1" and "Y74C9A.2b.4") for this gene:
$ grep "WBGene00022276" genes.gtf…
bli
- 3,130
- 2
- 15
- 36
6
votes
3 answers
How to convert a binary matrix of gene presence or absence into a fasta sequence
I have a binary matrix of gene presence or absence which looks like: [roary output]
Gene sample1 sample2 sample3 sample4
fliI 1 1 1 1
patB_1 1 1 1 1
pgpA 1 1 1 1
osmB …
AudileF
- 955
- 8
- 25
6
votes
1 answer
Where can I find a GWAS data set about genotypes and phenotypes for Zebrafish and rice?
My goal is to work on GWAS method development as a research project. However, human genome data are usually confidential because of the identification problem, so it's very hard to get them. (application takes a long time and I may not get it at the…
Haohan Wang
- 521
- 3
- 8
6
votes
0 answers
changing blast parameters using NCBIWWW module
I have found a blog post with a script that I would like to use for my current research project: link
The script is incredibly fast and produces a smooth conservation plot. In the blog post, the author mentions that it would be totally possible to…
bluescholar1212
- 421
- 2
- 10
6
votes
2 answers
combining the use of workdir and option --jobscript in snakemake
It seems that in snakemake the script specified after the --jobscript cannot be used properly if a workdir: is specified in the snakefile.
The path of the script specified becomes relative to the workdir defined in the snakefile instead of being…
Eric C.
- 208
- 2
- 5
6
votes
3 answers
How can I compare two bed files?
I am trying to translate (lift over) bed files describing genomic regions from hg37 to hg38. I have tried both UCSC's LiftOver tool and CrossMap but saw that they give me different results. I therefore need a way of assessing how correct the results…
terdon
- 10,071
- 5
- 22
- 48
6
votes
2 answers
What tools can I use for a bacterial core/pan genome pipeline?
I want to perform a genome comparison on a group of isolates. I want to look into two broad groups of taxa and compare the accessory genome in each group. I have been using prokka (v1.12) and roary (v3.8.2) to do this but it appears the…
AudileF
- 955
- 8
- 25
6
votes
3 answers
Filter Trinity transcriptome based on RNASeq reads
I have recently generated a genome-guided transcriptome with Trinity, and would like to apply an additional filter to exclude transcripts that don't have good support from the RNASeq reads. This is with the goal of trying to reduce the initial…
gringer
- 14,012
- 5
- 23
- 79
6
votes
3 answers
tools to reconcile experimental transcripts with reference annotation
Looking for tools to reconcile alignment file of experimental transcripts mapped to genome (SAM/BAM) with the reference transcriptome annotation (GTF) from Ensembl (organism: D. melanogaster).
The aim would be to check which transcripts reported in…
aechchiki
- 2,676
- 11
- 34
6
votes
1 answer
How to select a power for a scale-free topology network
In a weighted gene co-expression network analysis (using WGCNA), the soft-threshold power is recommended as a noise filtering. It consists on raising the correlation to a certain number. To decide this power the scale-free topology is estimated for…
llrs
- 4,693
- 1
- 18
- 42
6
votes
4 answers
multi-sequence alignment of samples with multiple contigs each
I want to perform a multi-sequence alignment on 12 samples that clustered based on cgMLST. Ultimate goal is to find out whether they differ by the presence of a few genes.
I have performed multi-sequence alignment in the past using MAFFT, the main…
BCArg
- 283
- 2
- 12
6
votes
2 answers
Where to find asymmetric nucleotide substitution matrix with IUPAC encodings?
Update: I submitted a pull request to the Biostrings repo. The functionality I describe in my question and answer can now be implemented with nucleotideSubstitutionMatrix(symmetric = FALSE).
I am using the pairwiseAlignment function from the…
acvill
- 613
- 3
- 12
6
votes
1 answer
Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?
I have always thought that 1/0 and 0/1 in VCF genotype fields are equivalent. And yet, GATK uses both. For example, these are two variants called in the same sample and the same run of GATK 4.1.4.0:
chr7 117120317 . …
terdon
- 10,071
- 5
- 22
- 48