Can two strings be matched as disjoint subsequences of a string?

Question

Consider a fixed finite alphabet $A$. I am given as input two strings $S_1$ and $S_2$ on $A$, and a string $S$ on $A$. It is of course possible in PTIME to determine whether $S_1$ is a (non-contiguous) subsequence of $S$, and likewise for $S_2$. But now I ask whether I can have matches $m_1$ and $m_2$ of $S_1$ and $S_2$ as subsequences of $S$ ($m_k$ for $k \in \{1, 2\}$ is a strictly increasing function from $\{1, \ldots, |S_k|\}$ to $\{1, \ldots, |S|\}$ such that, for all $i$, $S_k[i]$ and $S[m_k(i)]$ are the same letter) such that $m_1$ and $m_2$ are disjoint (that is, their images are disjoint). Intuitively, no character of $S$ matches $S_1$ and $S_2$ simultaneously.

Is there a PTIME algorithm to determine this (in the size of the input $S_1, S_2, S$), or can it be shown to be NP-hard? I would suspect it is NP-hard but I cannot see a reduction.

In case it is in PTIME, what about the straightforward generalization, for a constant $k$, where $k$ strings are given as input and I ask for the existence of pairwise disjoint matches of all $k$ strings?

(This problem is a bit related to How hard is unshuffling a string? except the candidate strings can be different, are provided as input, and are not required to cover the entire string $S$.)

It seems that cs.se had a question about a special case of this problem where $S_1=S_2$, and no answer about the hardness was reached. — R B, Sep 25 '14 at 08:29
@RB: Thanks for pointing this out. I think neither problem is obviously "harder" than the other, though, as in my case you may have $S_1 \neq S_2$ but the specific strings to use are provided in the input. — a3nm, Sep 25 '14 at 08:41
Did you try a big dynamic program? e.g. $v[i,j,k] = True$ if the problem is solvable when the input is the first $i$ characters of $S_1$, the first $j$ characters of $S_2$, and the first $k$ characters of $S$; $v[i,j,k] = False$ otherwise. — usul, Sep 26 '14 at 02:39
@usul: You are right, I had missed it but this dynamic algorithm does seem to work. Thanks! — a3nm, Sep 26 '14 at 06:33

Marzio De Biasi · Accepted Answer · 2014-10-02T12:29:46.267

Case k=2

Perhaps you can use the following polynomial time dynamic approach:

Keep a table $T_i$ of $(|S_1|+1)(|S_2|+1)$ pairs, in which element $(x,y)=1$ if and only if after scanning $i$ elements of the reference string $S$, $x$ characters of $S_1$ have been found, and $y$ characters of $S_2$ have been found (without overlaps).

Every pair of the table $T_0 (i=0)$ is zero except for the pair $(0,0)=1$. Copy $T_i$ to $T_{i+1}$ and read the next character $c=S[i+1]$ from $S$; if in $T_i$ there is an entry $(x,y)=1$ such that $S[x+1]=c$ then in $T_{i+1}$ set $(x+1,y)=1$; if in $T_i$ there is an entry $(x',y')=1$ such that $S[y'+1]=c$ then in $T_{i+1}$ set $(x',y'+1)=1$.

At the end of the scan, $S$ contains the merged distinct copies of $S_1$ and $S_2$ if and only if $(|S_1|,|S_2|)=1$.

Simple example:

S  =  a b a a c
S1 =  a a c
S2 =  b a  

S    0,0  0,1  0,2 1,0 1,1 1,2 2,0 2,1 2,2 3,0 3,1 3,2
     1    0    0   0   0   0   0   0   0   0   0   0  < T0
a    1    0    0   1   0   0   0   0   0   0   0   0  < T1
b    1    1    0   1   1   0   0   0   0   0   0   0  < T2
a    1    1    1   1   1   1   1   1   0   0   0   0  < T3   
a    1    1    1   1   1   1   1   1   1   0   0   0  < T4       
c    1    1    1   1   1   1   1   1   1   1   1  [1] < T5

Case k fixed

The same approach above can be used for any fixed $k >2$ (the table size grows polynomially).

Case k not fixed

If $k$ strings are given as input, and $k$ is NOT FIXED: in that case there is a quick reduction from the 3-PARTITION problem which is strongly NP-complete, so it remains NPC even when the input numbers are represented in unary.

Given a set of $A = \{x_1,x_2,...,x_{3m}\}$ of $3m$ positive integers (that can be specified in unary), and a target sum $B$ we must find $m$ disjoint subsets $A_1,A_2,...,A_m \subseteq A$ of $3$ elements ($|A_i|=3$), and the sum of the elements of each subset must be $B$.

Over alphabet $\{a,b,c\}$, set:

"reference" string $S = a^3 \; b^B\; c^3 \; a^3 \; b^B\; c^3 \; ... a^3 \; b^B\; c^3 \;$ ($a^3 b^B c^3$ repeated $m$ times)
for $i = 1,...,3m$ set string $S_i = a \; b^{x_i} \; c$

Informally: the $a^3$ parts of $S$ force to match $3$ of the $S_i$s (otherwise some $S_i$ will not be included); the $c^3$ parts force to "close" $3$ of the $S_i$s. And the $b^B$ blocks (there are $m$ of them) must be filled exactly with the $b^{x_i}$ substrings of $3$ distinct $S_i$.

Small example:

3-PARTITION instance: $A = \{ 3, 2, 1, 5, 4, 3\}, B=9, (m=2)$

S  = aaa bbbbbbbbb ccc aaa bbbbbbbbb ccc
S1 = a bbb c
S2 = a bb c
S3 = a b c
S4 = a bbbbb c
S5 = a bbbb c
S6 = a bbb c

Thanks! Thinking about it for a bit more I understand that the dynamic algorithm (the workings of which I already understood when the $S_i$'s had to cover the entire $S$) does indeed work without this assumption. This satisfies me. (So, in what you posted: this is not a greedy algorithm, for what I understand, but a dynamic algorithm.) And yes, I already knew (following a previous answer by you :)) that the problem for unbounded $K$ is strongly NP-hard by a reduction from 3-PARTITION. :) — a3nm, Sep 26 '14 at 06:35
Your version of 3-PARTITION is wrong. In the real 3-PARTITION, it is necessary to divide $A$ into $m$ sets of size $3$, not into $3$ sets of size $m$. Your version is not strongly NP-hard (unless P=NP), because there is a pseudopolynomial algorithm similar to that for knapsack problem. — abacabadabacaba, Oct 02 '14 at 11:46
@abacabadabacaba: you are right, too many reductions in my mind :-) However the reduction I had in mind follows the correct definition (see another similar answer); I fixed the answer. — Marzio De Biasi, Oct 02 '14 at 12:31

Can two strings be matched as disjoint subsequences of a string?

1 Answers1