3

I am trying to build a features matrix to be used for Random Forest based classification. I'd like to add, as features, short motifs which are common to all the protein sequences belonging to a specific gene family.

I tried to use MEME but I'd like to know if there are also other good tools (online or command line). Keep in mind I have more than 10.000 sequences and that I prefer short motifs (3/4 aa long).

wrong_path
  • 391
  • 1
  • 7

1 Answers1

3

I think MEME is a good tool for your purpose, but there are others as well. I can think of InterProScan for example (although I am not sure if it is really de novo). Here a summary of available tools.

benn
  • 3,571
  • 9
  • 28