4

I am looking for tools to extract features from short DNA sequences. For example, entropy, complexity, GC-content, etc.

I have found the generateFeatures.py script from the PyFeat repo, but is there love a more widely used source code or a standard way to extract features from sequences biopython or similar?

Also, I think that since there are $2N$ encoded bits in a sequence of $N$ nucleotides, we have at most $2N$ independent features that could be extracted.


In addition, I am curios if there are any transformers models for DNA sequences.


Edit: In addition, you can use DeepHF's feature utils, which can be accessed and consumed as seen here.

0x90
  • 1,437
  • 9
  • 18

1 Answers1

2

If you want to do this with biopython, the SeqUtils package could be a solution.

Mr_Z
  • 629
  • 1
  • 5
  • 15