Is there a simple characterization of regular languages closed under circular shifts?

Question

A language $L$ is closed under circular shifts if, for every word $w = a_1 ... a_n$ and circular shift $w' = a_i ... a_n a_1 ... a_{i-1}$ of $w$, then $w \in L$ iff $w' \in L$. It is equivalent to require that $L$ is closed under conjugation, i.e., for every word $w = uv$, letting $w' = vu$, we have $w \in L$ iff $w' \in L$.

These closure requirements are weaker that requiring the language to be closed under permutation or commutative, i.e., for any word $w = a_1 ... a_n$, for every permutation $\sigma$ of $\{1, ..., n\}$, letting $w' = a_{\sigma(1)} ... a_{\sigma(n)}$, we have $w \in L$ iff $w' \in L$.

Is there a simple characterization of the regular languages that are closed under cyclic shifts? I'm thinking about a characterization that would make such languages easy to understand. For instance, the commutative regular languages can be easily understood: membership to the language is determined by the Parikh image, and then being regular means the language only imposes a threshold or modularity condition on the components of the Parikh image.

Some related work:

It is known that the closure of a regular language under cyclic shifts is also regular, and the same is also known of context-free languages. There is a study of the state complexity of the operation of closing under cyclic shift here but it doesn't characterize the languages that are already closed.
There is a notion of cyclic language that has been studied, e.g., here. A cyclic language is closed under conjugation and also satisfies requirement that for every word $w$ and power $n$ we have $w \in L$ iff $w^n \in L$, i.e., membership to the language is determined by the primitive root of words. (This requirement is discussed in this question.) There is also a notion of strongly cyclic language in this paper which is defined in terms of automata but do not seem to be the class I ask about.
There is a related notion of circular languages studied in bioinformatics (e.g., here) where words are quotiented to see them as circular, but I'm not sure of the relationship.

Have you looked towards equations for lattices of languages (in the sense of §XII.1 of https://www.irif.fr/~jep/PDF/MPRI/MPRI.pdf—see also §XIII.5 for the difference with classical profinite equations)? This approach seems to work at least for the cyclic languages you mentioned, see §XIV.1.5. A priori languages closed under cyclic shifts should be exactly those satisfying $xy \leftrightarrow yx$. — Rémi, Jan 29 '24 at 16:50
@Rémi I see; nice indeed that Prop 1.21 in XII.1 distinguishes both parts of the definitions of the cyclic languages. I must confess I haven't looked for now at what these equations mean, but thanks for the pointer! — a3nm, Mar 05 '24 at 10:16
The characterization can be reformulated as follows: a language L is cyclic iff, letting $S(L)$ be the syntactic semigroup of $L$ and $\textrm{Acc}$ be the subset of $S(L)$ accepting $L$, for all $x,y \in S(L)$, $xy \in \textrm{Acc} \Leftrightarrow yx \in \textrm{Acc}$. But I'm not entirely convinced that this qualifies as "easy to understand" for most (reasonable) people. — Rémi, Mar 07 '24 at 16:13
@Rémi: ah, once rephrased like this, I think the characterization is pretty easy to understand, I would even say it's almost "too easy" -- superficially, it sounds like the immediate translation of being closed under circular shifts, seen on the syntactic monoid rather than on words. But this is still nice to know, so thanks! — a3nm, Mar 08 '24 at 10:50

Denis · Accepted Answer · 2024-02-02T00:11:56.803

We can propose an automaton model characterizing regular circular languages: a C-automaton is an NFA where all states are initial. A run must see an accepting state somewhere, and must start and end in the same state. C-automata can clearly only accept regular circular languages, since a rotation of a run is still a run (circularity), and a normal NFA can guess the existence of a run (regularity). Moreover, starting from any NFA $A$, we can obtain a C-automaton for the circular closure of $L(A)$ (it is how we can prove the first item mentioned by @a3nm in the question), so the model is able to recognize any regular circular language, since such a language is its own circular closure.

This automaton model can in turn help to prove that the proposition of @MarzioDeBiasi for a MSO characterization is correct. Let us consider the logic $MSO[S']$, where S' is the successor relation $(x,x+1)$ augmented with the pair $(last,first)$. It is clear that this logic can only express regular circular languages, since $S'$ is invariant by rotation. Moreover, the fact that a C-automaton has an accepting run can be expressed in $MSO[S']$: the formula can guess a labeling by states, and verify all the conditions asked in a run of a C-automaton. Notice that the condition that the first state $q_0$ is the same as the last will be verified in the same way as all other transitions, by verifying that there is a transition $(q_{n-1},a_n,q_0)$ on the last letter. In fact the formula cannot distinguish this case from the other transitions.

Remark that the equality $x=y$ can be expressed by $\exists t. S'(t,x)\wedge S'(t,y)$, so we do not need to add it explicitly in the signature. This allows to express languages such as "there is a unique occurrence of $a$".

We can conclude that both C-automata and $MSO[S']$ characterize the class of regular circular languages.

Nice, it seems to work! A rephrase is a DFA in which the initial state is picked at random (among all states) and that state becomes the only final state. — Marzio De Biasi, Feb 02 '24 at 09:31
The random formulation is kind of a can of worms, it will give a probability that a word is accepted. What you want is more of an existential quantification, which is the same as setting all states initial. — Denis, Feb 02 '24 at 13:21
you're right, I was wondering if there is another way to get rid of the "all states initial" (another option is "an $\epsilon$ move to all states" but it doesn't add anything nor make your formulation simpler/more "elegant"). — Marzio De Biasi, Feb 02 '24 at 13:47
I'm reacting very late to this, but thanks a lot @Denis for your answer, those are two very nice characterisations! :) — a3nm, Mar 05 '24 at 10:09

Marzio De Biasi · Answer 2 · 2024-02-01T14:01:45.387

2

Just an extended note trying to recover my previous (wrong) answer.

A language $L$ is closed under cyclic shifts if and only if

$aw \in L \Leftrightarrow wa \in L\;$ ($a \in \Sigma$)

indeed after applying a single shift (rotation) you can apply the condition another time on the new string and so on.

As noted by Rémi in his comment the condition is equivalent to $xy \in L \Leftrightarrow yx \in L$;
so cyclic shift = rotation = conjugation.

In the MSO logic characterization of regular languages the above condition is equivalent to using the relation $succrot$ instead of the usual relation $<$; where $succrot$ can be defined (in MSO) as:

$succrot(x,y) \stackrel{\text{def}}{=} succ(x, y) \lor (first(y) \land last(x)) $
$= (x < y \land \neg\exists z. x<z<y) \lor (\neg\exists z . (x < z \lor z < y) )$

(still thinking if it's not too weak :-)

edited Feb 01 '24 at 14:01

answered Jan 29 '24 at 08:55

Marzio De Biasi

22,915
2
58
127

In your logic you can define the language $ab^*a$ which is not circular, by saying that there are at least 2 $a$, and for any two distinct $a$ positions $x,y$ we have $x<y$. – Denis Jan 31 '24 at 09:59
@Denis can you be more precise: in my language < doesn't exist and also "distinct" should be defined using $ <^s$. Do you see a way to define $ab^*a$ using $<^s$ instead of $<$? – Marzio De Biasi Jan 31 '24 at 11:42
yes here is the formula:
$[\exists x,y. x<^s y \wedge a(x)\wedge a(y)] \wedge [\forall x,y. a(x)\wedge a(y) \Rightarrow x=y\vee x<^s y]$. The first part says that two $a$ exist, and the second says that the only $a$'s are at the first and last position.
– Denis Jan 31 '24 at 12:08
If you don't have $x=y$ in your signature, it is equivalent to $\neg (x<^s y \vee y<^s x)$. I'm ignoring here the words with less than 2 letters but the counter-example stands regardless of the behaviour on these words. – Denis Jan 31 '24 at 12:16
(btw I forgot the "s" superscript in my very first answer, sorry it would have been much clearer with $x<^s y$ which is what I meant) – Denis Jan 31 '24 at 12:30
@Denis: you're right. I'll see if it can be fixed with an "elegant" relation. The informal condition to capture is: $\varphi \to \varphi'$ where $\varphi'$ is obtained from $\varphi$ replacing each $x$ with $y$ and adding the condition $succmod(x,y)$ defined as $y=x+1 \lor (first(y) \land last(x))$ – Marzio De Biasi Jan 31 '24 at 13:24
@Denis perhaps using a logic restricted only to the $succrot$ relation should work: $x<^{succrot}y = [(x < y) \land \lnot \exists z . x<z<y] \lor [ (first(y) \land last(x)]$ . if it is not too weak – Marzio De Biasi Feb 01 '24 at 12:31
it looks plausible. to prove that it's the case it would probably help to have an equivalent automaton model, but this does not look obvious. I was thinking about unfolding the word into a bi-infinite word labeled by $\mathbb Z$ and using usual automata model on this, but this loses information about the length. – Denis Feb 01 '24 at 20:48
maybe the corresponding automaton model is quite simple as well: no initial or final state, but there must be a transition from the last position to the first, and an accepting transition somewhere. – Denis Feb 01 '24 at 21:51

Is there a simple characterization of regular languages closed under circular shifts?

2 Answers2

Linked