2

I am looking for an algorithm which can detect cyclic pattern in streams of characters which have random noise embedded in them.

An example of cyclic pattern with a set of characters a - m for a sequence of characters abcbacbabcecbacbabcdefefghjklm will be abcbacbabcecbacbabc. Pattern mostly cycles around abc. The latter part of the sequence efefghjklm are not forming any cycles.

Any help with this is appreciated. Also if you need more clarifications let me know.

riza-h
  • 33

1 Answers1

0

This looks like an application for Hidden Markov Models. Well this isn't even hidden.

Hidden Markov Models describe states and their transaction. The next state only depends on the current state. Hidden refers to the fact, that a state might only be measurable due to his emissions. An example for clarification:

Assume you have two types of fish. A fish can move straight, left or right. And a fish has two states called stressed and relaxed. You can only measure his movements (left, right, straight...). Such a model would look like this.

model

The two states emits with a given probability the actions left, right and straight. There are now 3 main tasks which can be done with such a model. One can determine the probability a chain of actions belongs to one fish, the most likely sequence of hidden states which lead to this observation or the estimation of all the transition and emission probabilities.

A sequence could e.g. be LRLLRSLRLSLR, and the questions: Which fish produced this sequence? When was he stressed (Stressed, Relaxed, Relaxed, Stressed,...)? What are the probabilities?

In your case there are no emissions. The states are the emissions itself. Therefor you can model transitions from one letter to another.

Im not sure if this really solves your problem. Does your sequence have to be in order? abc is obviously such a sequence. Would abNc (N for noise) be correct to? And what about bca? If your are just searching for spefic sequences which might contain other letters in between this google example could might help.