Bounds on the size of NFA for $r$-skip $k$-distinct language

Question

This question is about an extension of a language discussed in this question.

We define the $r$-skip $k$-distinct language as follows:

$$L_{r,k}=\{\sigma_1\sigma_2\cdots \sigma_{rk}\in\Sigma^{rk} | \forall i\neq j\in [rk],i=j \mod r\implies \sigma_i \neq\sigma_j\}$$

That is, the set of letters whose distance is a multiplication of $r$ are different.

This language is finite and therefore regular. Specifically, if $|\Sigma|=n$, then $|L_{r,k}|=$$({n\choose{k}}\cdot k!)^r$.

For $r=1$ we get the language $L_{k-distinct}$ defined in the linked question.

A natural NFA build for $L_{r,k}$ uses the color coding scheme and for a $(|\Sigma|,k)$-perfect hashing family $\mathcal F$ (i.e. mapping the letters into $k$ indices) selects a hash function $f\in \mathcal F$ by an epsilon transition and then verifies that each set of letters $\Sigma_i = \{\sigma_j|i=j \mod r\}$ is distinct.

Such perfect hashing family build of size $O(e^{k+\log^3 k}\cdot n\log n)$ is known

The resulting automaton is of size $O((2e)^{r(k+O(\log^3 k))}\cdot n\ \text{polylog}\ n)$, which is the same as we would get for $L_{1,rk}$ (which is $L_{rk-distinct}$) by using the same approach. (The reason for this size, at least in a naive build, is that the automaton state have to encode which of the $k$ colors we've seen for each of the $r$ letter sets).

A more careful build would give $O(4^{r(k+O(\log^2 k))}\cdot n\ \text{polylog}\ n)$ sized automaton (same goes for $L_{1,rk}$).

This sounds wasteful, as a lot fewer comparison are needed as $r$ grows larger (and $rk$ is constant).

What is the smallest automaton we can build for $L_{r,k}$?

at least there is a small alternating automaton – Denis Jul 15 '14 at 22:55 — Denis, Jul 15 '14 at 22:55

Denis · Answer 1 · 2014-07-16T16:52:04.043

I detail the comment below, as you could be interested in this answer. I don't know about NFA, but if your goal is to represent this language with a small automaton, you could use the model of alternating automata, or AFA.

Intuitively, DFA are with $0$ player: the run is updated automatically, NFA are with $1$ player who has to choose a good run, and AFA are played by two players: one tries to accept, the other to reject. The word is in the language if the "good" player (who wants to accept) has a winning strategy on this word. This does not increase the expressive power of automata.

Here, you have an AFA with essentially $kr (\log n)^k$ states: At one point the opposite player can guess that some letter with same position modulo $r$ will be equal to the current one. Then the "good" player guesses $k$ bit positions of the current letter (this requires $(\log n)^k$ memory) : one which will be different for each later equivalent position. Then we just have to check that one of these bits is indeed different every $r$ positions.

Thanks @Denis. While this is an interesting model, I'm specifically interested in either NFA or NXA (Non deterministic Xor Automaton), as these are the ones I know to simulate as a part of a larger algorithm I have. I know how to build $\widetilde O(2^{rk})$ NXA for this language and $\widetilde O(4^{rk})$ NFA, but I believe it's likely that a $[2^{k\cdot o(r)}\cdot \text{poly}\ n]$ sized NFA/NXA for it exists. — R B, Jul 16 '14 at 10:10

Bounds on the size of NFA for $r$-skip $k$-distinct language

1 Answers1