2

I have a list of human genes that have been proven to be up-regulated in a disease condition of interest, using microarray analysis. I also have a protein which according to literature is said to be a DNA binding protein. I wish to be able to computationally predict if the protein mentioned binds to any of the promoter sequences of my differentially expressed genes and if yes , then at which position?

Are there any Bioinformatic tools available for this task?

Charles
  • 537
  • 6
  • 21

2 Answers2

4

You could run FIMO on your entire genome for TFs (transcription factors; DNA-binding proteins) of interest, which gives you binding sites: genomic intervals where those TFs bind.

https://bioinformatics.stackexchange.com/a/2491/776

You can then map or intersect promoter regions with those TF binding sites with BEDOPS bedmap or bedops, respectively:

https://bedops.readthedocs.io/en/latest/content/reference/statistics/bedmap.html

https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html

The use of bedmap is recommended to assign TFs to a promoter of interest. This tool will return both the promoter region and any TFs that overlap it by the specified overlap criteria.

Alex Reynolds
  • 3,135
  • 11
  • 27
  • Thanks a lot for your very helpful response. "FIMO scans a set of sequences for individual matches to each of the motifs you provide ". So how do I relate ouput from FIMO to TF binding sites and the genomic intervals? Is there any method to get the specific names of the genes whose promoter regions bind my TF? I have used the FIMO web tool http://meme-suite.org/tools/fimo. – Charles May 31 '19 at 08:50
  • I don't understand. If you have promoters, those are derived from genes, no? – Alex Reynolds May 31 '19 at 12:20
  • Sorry for the silly question, but which column of FIMO output corresponds to the TF binding sites? I now want to intersect promoter regions with those TF binding sites using BEDOPS bedops. – Charles May 31 '19 at 14:03
  • 1
    The output from the command-line tool should be in BED format, so the binding site of the TF would be the first three columns. I don't know what format comes out of the web tool, but I would start by reading the documentation. – Alex Reynolds May 31 '19 at 14:31
  • I have used the command line tool by running: fimo [options] as found here http://meme-suite.org/doc/fimo.html. But none of the output from the above command is a .bed file. The output files include fimo.html, fimo.tsv, fimo.gff, cisml.xml and fimo.xml, files. I want to be able to compare results from the command line and results from the online tool, that is why I run the two methods. – Charles May 31 '19 at 14:59
  • Maybe read the SE post I linked to at the top of my answer. This has a fimo command that writes to a BED file. Hope this helps! – Alex Reynolds May 31 '19 at 15:15
  • Thank you once more. But I discover so many repeats after running FIMO. The repeats also occur when I discover the motifs on my transcription factor DNA sequence (obtained using Ensembl), using the MEME tool. I set number of motifs to detect to 3 and size of motifs to 10bp. I generated the promoter sequences of my genes fed into FIMO using genome browser after setting promoter sequence to 1000bp above TSS. Please is there a way I can solve the problem of the repeats at any of the levels? The genes I use actually come from a Single cell RNA seq experiment. – Charles Jun 04 '19 at 16:33
  • Is it normal to have a motif look like this:CACACACACACACACACACACACACACACACACACACACACACA – Charles Jun 04 '19 at 21:10
  • 1
    It seems like you might be doing things out of order: You would feed the whole genome (mappable regions) into FIMO against Jaspar or other published motif databases, limiting your search to motifs no longer than, say, 20nt. Then do your set operations against this result with your promoters, at a desired statistical threshold (1e-5 or less). If you're looking to predict putative motifs from your promoters directly, then you would use MEME, not FIMO. Try re-reading the answer I gave. – Alex Reynolds Jun 05 '19 at 00:38
  • Thanks a lot. I figured out many things. After running intersecting the TF binding sites with promoter sequences 500 bp above TSS, I obtain results like this: chr1 2000801 2000900 . I have searched the internet on how to decode these genomic coordinates into gene names to no avail. Is there a tool that i can use to do this? Secondly is there a possibility to have graphical output from running bedops? Thanks – Charles Jun 06 '19 at 06:35
0

If you work with human, I would start by intersecting your promoter coordinates with the ReMap2018 database which is a comprehensive collection of published ChIP-seq datasets. Extract those TFs that fit your research question. Alternatively you can browse NCBI for published datasets of your TF and then intersect those with your annotations.

Also cross-posted: https://www.biostars.org/p/381751/

  • thanks a lot for your response but can you please elaborate a little more? For e.g. I do not understand the meaning of " intersecting promoter coordinates" with a data base. – Charles Nov 17 '19 at 11:25
  • 1
    The database provides lists in BED formats and you have genomic coordinates. You something like bedtools intersect to intersect them to scan for overlaps. –  Dec 17 '19 at 12:14