11

A language $L$ is closed under circular shifts if, for every word $w = a_1 ... a_n$ and circular shift $w' = a_i ... a_n a_1 ... a_{i-1}$ of $w$, then $w \in L$ iff $w' \in L$. It is equivalent to require that $L$ is closed under conjugation, i.e., for every word $w = uv$, letting $w' = vu$, we have $w \in L$ iff $w' \in L$.

These closure requirements are weaker that requiring the language to be closed under permutation or commutative, i.e., for any word $w = a_1 ... a_n$, for every permutation $\sigma$ of $\{1, ..., n\}$, letting $w' = a_{\sigma(1)} ... a_{\sigma(n)}$, we have $w \in L$ iff $w' \in L$.

Is there a simple characterization of the regular languages that are closed under cyclic shifts? I'm thinking about a characterization that would make such languages easy to understand. For instance, the commutative regular languages can be easily understood: membership to the language is determined by the Parikh image, and then being regular means the language only imposes a threshold or modularity condition on the components of the Parikh image.

Some related work:

  • It is known that the closure of a regular language under cyclic shifts is also regular, and the same is also known of context-free languages. There is a study of the state complexity of the operation of closing under cyclic shift here but it doesn't characterize the languages that are already closed.
  • There is a notion of cyclic language that has been studied, e.g., here. A cyclic language is closed under conjugation and also satisfies requirement that for every word $w$ and power $n$ we have $w \in L$ iff $w^n \in L$, i.e., membership to the language is determined by the primitive root of words. (This requirement is discussed in this question.) There is also a notion of strongly cyclic language in this paper which is defined in terms of automata but do not seem to be the class I ask about.
  • There is a related notion of circular languages studied in bioinformatics (e.g., here) where words are quotiented to see them as circular, but I'm not sure of the relationship.
Hermann Gruber
  • 6,470
  • 1
  • 35
  • 58
a3nm
  • 9,269
  • 27
  • 86
  • 1
    Have you looked towards equations for lattices of languages (in the sense of §XII.1 of https://www.irif.fr/~jep/PDF/MPRI/MPRI.pdf—see also §XIII.5 for the difference with classical profinite equations)? This approach seems to work at least for the cyclic languages you mentioned, see §XIV.1.5. A priori languages closed under cyclic shifts should be exactly those satisfying $xy \leftrightarrow yx$. – Rémi Jan 29 '24 at 16:50
  • @Rémi I see; nice indeed that Prop 1.21 in XII.1 distinguishes both parts of the definitions of the cyclic languages. I must confess I haven't looked for now at what these equations mean, but thanks for the pointer! – a3nm Mar 05 '24 at 10:16
  • The characterization can be reformulated as follows: a language L is cyclic iff, letting $S(L)$ be the syntactic semigroup of $L$ and $\textrm{Acc}$ be the subset of $S(L)$ accepting $L$, for all $x,y \in S(L)$, $xy \in \textrm{Acc} \Leftrightarrow yx \in \textrm{Acc}$. But I'm not entirely convinced that this qualifies as "easy to understand" for most (reasonable) people. – Rémi Mar 07 '24 at 16:13
  • 1
    @Rémi: ah, once rephrased like this, I think the characterization is pretty easy to understand, I would even say it's almost "too easy" -- superficially, it sounds like the immediate translation of being closed under circular shifts, seen on the syntactic monoid rather than on words. But this is still nice to know, so thanks! – a3nm Mar 08 '24 at 10:50

2 Answers2

5

We can propose an automaton model characterizing regular circular languages: a C-automaton is an NFA where all states are initial. A run must see an accepting state somewhere, and must start and end in the same state. C-automata can clearly only accept regular circular languages, since a rotation of a run is still a run (circularity), and a normal NFA can guess the existence of a run (regularity). Moreover, starting from any NFA $A$, we can obtain a C-automaton for the circular closure of $L(A)$ (it is how we can prove the first item mentioned by @a3nm in the question), so the model is able to recognize any regular circular language, since such a language is its own circular closure.

This automaton model can in turn help to prove that the proposition of @MarzioDeBiasi for a MSO characterization is correct. Let us consider the logic $MSO[S']$, where S' is the successor relation $(x,x+1)$ augmented with the pair $(last,first)$. It is clear that this logic can only express regular circular languages, since $S'$ is invariant by rotation. Moreover, the fact that a C-automaton has an accepting run can be expressed in $MSO[S']$: the formula can guess a labeling by states, and verify all the conditions asked in a run of a C-automaton. Notice that the condition that the first state $q_0$ is the same as the last will be verified in the same way as all other transitions, by verifying that there is a transition $(q_{n-1},a_n,q_0)$ on the last letter. In fact the formula cannot distinguish this case from the other transitions.

Remark that the equality $x=y$ can be expressed by $\exists t. S'(t,x)\wedge S'(t,y)$, so we do not need to add it explicitly in the signature. This allows to express languages such as "there is a unique occurrence of $a$".

We can conclude that both C-automata and $MSO[S']$ characterize the class of regular circular languages.

Denis
  • 8,843
  • 28
  • 55
  • Nice, it seems to work! A rephrase is a DFA in which the initial state is picked at random (among all states) and that state becomes the only final state. – Marzio De Biasi Feb 02 '24 at 09:31
  • 2
    The random formulation is kind of a can of worms, it will give a probability that a word is accepted. What you want is more of an existential quantification, which is the same as setting all states initial. – Denis Feb 02 '24 at 13:21
  • you're right, I was wondering if there is another way to get rid of the "all states initial" (another option is "an $\epsilon$ move to all states" but it doesn't add anything nor make your formulation simpler/more "elegant"). – Marzio De Biasi Feb 02 '24 at 13:47
  • I'm reacting very late to this, but thanks a lot @Denis for your answer, those are two very nice characterisations! :) – a3nm Mar 05 '24 at 10:09
2

Just an extended note trying to recover my previous (wrong) answer.

A language $L$ is closed under cyclic shifts if and only if

$aw \in L \Leftrightarrow wa \in L\;$ ($a \in \Sigma$)

indeed after applying a single shift (rotation) you can apply the condition another time on the new string and so on.

As noted by Rémi in his comment the condition is equivalent to $xy \in L \Leftrightarrow yx \in L$;
so cyclic shift = rotation = conjugation.

In the MSO logic characterization of regular languages the above condition is equivalent to using the relation $succrot$ instead of the usual relation $<$; where $succrot$ can be defined (in MSO) as:

$succrot(x,y) \stackrel{\text{def}}{=} succ(x, y) \lor (first(y) \land last(x)) $
$= (x < y \land \neg\exists z. x<z<y) \lor (\neg\exists z . (x < z \lor z < y) )$

(still thinking if it's not too weak :-)

Marzio De Biasi
  • 22,915
  • 2
  • 58
  • 127
  • In your logic you can define the language $ab^*a$ which is not circular, by saying that there are at least 2 $a$, and for any two distinct $a$ positions $x,y$ we have $x<y$. – Denis Jan 31 '24 at 09:59
  • @Denis can you be more precise: in my language < doesn't exist and also "distinct" should be defined using $ <^s$. Do you see a way to define $ab^*a$ using $<^s$ instead of $<$? – Marzio De Biasi Jan 31 '24 at 11:42
  • yes here is the formula:

    $[\exists x,y. x<^s y \wedge a(x)\wedge a(y)] \wedge [\forall x,y. a(x)\wedge a(y) \Rightarrow x=y\vee x<^s y]$. The first part says that two $a$ exist, and the second says that the only $a$'s are at the first and last position.

    – Denis Jan 31 '24 at 12:08
  • If you don't have $x=y$ in your signature, it is equivalent to $\neg (x<^s y \vee y<^s x)$. I'm ignoring here the words with less than 2 letters but the counter-example stands regardless of the behaviour on these words. – Denis Jan 31 '24 at 12:16
  • (btw I forgot the "s" superscript in my very first answer, sorry it would have been much clearer with $x<^s y$ which is what I meant) – Denis Jan 31 '24 at 12:30
  • @Denis: you're right. I'll see if it can be fixed with an "elegant" relation. The informal condition to capture is: $\varphi \to \varphi'$ where $\varphi'$ is obtained from $\varphi$ replacing each $x$ with $y$ and adding the condition $succmod(x,y)$ defined as $y=x+1 \lor (first(y) \land last(x))$ – Marzio De Biasi Jan 31 '24 at 13:24
  • @Denis perhaps using a logic restricted only to the $succrot$ relation should work: $x<^{succrot}y = [(x < y) \land \lnot \exists z . x<z<y] \lor [ (first(y) \land last(x)]$ . if it is not too weak – Marzio De Biasi Feb 01 '24 at 12:31
  • it looks plausible. to prove that it's the case it would probably help to have an equivalent automaton model, but this does not look obvious. I was thinking about unfolding the word into a bi-infinite word labeled by $\mathbb Z$ and using usual automata model on this, but this loses information about the length. – Denis Feb 01 '24 at 20:48
  • maybe the corresponding automaton model is quite simple as well: no initial or final state, but there must be a transition from the last position to the first, and an accepting transition somewhere. – Denis Feb 01 '24 at 21:51