Most Popular

1500 questions
3
votes
4 answers

How to merge transcript sequence with same name in a FASTA file.

Suppose we have a fasta file like >Seq1 GTTGAGAGGTGTATGGACACGAAAAACGAAACTGTATCCCGTGTTTAGCAAAGAAATCAT >Seq1 AAAAACGAAACTGTATCCCGTGTTT >Seq2 CGTGTTTAGCAAAGAAAT I want to…
sksahu
  • 51
  • 5
3
votes
3 answers

Effect of mutation in DNA sequence on transcription factor binding sites

How much does a single mutation/alteration of a nucleotide affect the presence of a transcription factor binding site (TFBS)? I am from computer science background(Obviously). I want to make a general assumption about the number of mutated bases…
3
votes
2 answers

Convert Cotton Probe ID to Gene Symbol

I am new to bioinformatics, my background is in Electrical Engineering. I am trying to convert Affymetrix Cotton Probe IDs to gene symbols. I have a gene expression dataset and I need the expressions for only certain genes. So in the dataset, I have…
Adi
  • 41
  • 3
3
votes
1 answer

TRAL does not find "Phobos result file"

I want to use TRAL to annotate tandem repeats in the reference genome of Caenorhabditis elegans. For this, I need to install some external software, such as Phobos. I've downloaded Phobos and I am using it by typing the path to it's executable like…
Biomagician
  • 2,459
  • 16
  • 30
3
votes
1 answer

Bioinformatics approach to dentifing potential PCR primer sequences for transcribed gene

I have an annotated transcriptome and would like to develop PCR primers for particular transcribed genes. My species is a non-model plant. Can I use BLAST or another tool to identify potential PCR primer sequences? Or more generally, is there a…
Peter Pearman
  • 183
  • 1
  • 6
3
votes
1 answer

Where to download baseline/average gene expression level of all human coding genes?

I am looking for the most appropriate dataset for downloading baseline gene expression level across all human coding genes during development. I am aware that EMBL Expression Atlas is one of the resources that provide such information, but I am…
RJF
  • 181
  • 1
  • 8
3
votes
1 answer

How many reads do I need to cover the entire genome?

Suppose my genome is 3 million bases and that my reads are 100 nucleotide long. I need to know how many reads I need to cover the entire genome. I start from using the equation $C = \frac{N \cdot L}{G}$ where C is the coverage, N the number of…
wrong_path
  • 391
  • 1
  • 7
3
votes
2 answers

Associating SNP and GENE

Assuming I have SNPs data using hg19, how can I know which SNP belongs into which Gene? The data looks like: chr10_103577643 chr10_124712463 and so on. I want to add a column of Gene, which would tell to which Gene the SNP belongs. The file is…
Kozolovska
  • 241
  • 1
  • 4
3
votes
1 answer

finding RNA-protein physical interaction

We all know that for physical protein-protein interaction, we need to find the distance between residues from PDB file of that interaction (finding distance between carbon alpha, carbon beta or centroid of two residues in PDB data of two proteins).…
Sara
  • 777
  • 1
  • 6
  • 18
3
votes
1 answer

Increase number of threads for GATK 4.0 HaplotypeCaller

I am using GATK version 4.0, I tried to use multiple threads for calling variants using HaplotypeCaller using following command gatk --java-options -Xmx90G -nt 28 HaplotypeCaller -I output.bam -R wheat_ref.fa -O final.vcf and the error is '-nt'…
3
votes
1 answer

Is there a way to tell which chromosome a gene is on, by looking at the "Chromosome/scaffold name"

I recently got a data set, from which I need to figure out which chromosome a gene is from, but the head of the data reads like: Gene ID Description Gene type Gene End (bp) Gene Start (bp) Strand Associated Gene Name …
Haohan Wang
  • 521
  • 3
  • 8
3
votes
1 answer

Batch detection of CRISP proteins in fasta file

Probably a naive question. I am inexperienced. I am interested in identifying potential CRISP (Cysteine-rich secretory proteins) in a certain tissue transcriptome (ca. 20k sequences in fasta). I have detected signalP and estimated % of cysteine in…
Scientist
  • 111
  • 7
3
votes
3 answers

Retrieving NCBI Taxa IDs from refseq or GenBank assembly accession

I have about 10,000 genome files all named by either refseq or genbank accession number, do you know if it's possible to convert these numbers to the corresponding NCBI taxon ID or species? for example: GCA_000005845.2 to 79781 In the case of…
Biomage
  • 173
  • 7
3
votes
1 answer

How to convert files to ADAM format?

I would like to convert BAM and VCF files to ADAM format. How do I do that?
Jon Deaton
  • 399
  • 2
  • 10
3
votes
1 answer

Error in seq.default in chromPlot

I am using chromPlot to visualise the genome of C. elegans. library(chromPlot) I have created the following data frame with the lengths of C. elegans chromosomes. Chrom Start End Name 1 1 0 15072434 contigs 2 2 0 15279421…
Biomagician
  • 2,459
  • 16
  • 30