27

I've been thinking about the following problem for a time, and I haven't found a polynomial solution for it. Only brute-fource. I've been trying to reduce an NP-Complete problem into it too with no sucess.

Here is the problem:


You have a sorted set $\{(A_1, B_1), (A_2, B_2), \ldots, (A_n, B_n)\}$ of positive integers pairs.

$(A_i, B_i) < (A_j, B_j) \Leftrightarrow A_i < A_j \lor (A_i = A_j \land B_i < B_j)$ $(A_i, B_i) = (A_j, B_j) \Leftrightarrow A_i = A_j \land B_i = B_j$

The following operation can be applied to a pair: Swap(pair). It swaps the elements of the pair, so $(10, 50)$ will become $(50, 10)$

When a pair in the set is swapped, the set automatically gets sorted again (the swapped pair is out of place and it will get moved into its place in the set).

The problem consist on see if there is a sequence that, starting on some pair, swaps the entire set, with the following condition:

After a pair is swapped, the next pair to be swapped has to be either the successor or the predecesor pair in the set.


It would be great to find a polynomial time solution to this problem, or a reduction of an NP-Complete problem into it.

Note:
It's already a decision problem. I don't want to know which the sequence is: only if a sequence exists.

Example of how the set gets sorted after swapping a pair

$\textbf{(6, 5)}$
$(1,2)$
$(3,4)$
$(7,8)$

If I swap the first pair, it becomes to: $(5,6)$, and after sorting the set (placing the sorted pair in its new position), we have:

$(1,2)$
$(3,4)$
$\textbf{(5,6)}$
$(7,8)$

Then I have to swap either the $(3,4)$ (predecessor) pair or $(7,8)$ (sucessor), and repeat the process until all pairs are swapped (if possible).

Important:
You cannot swap an already swapped pair.
If there is a sequence of 'swap' operations, then all pairs has to be renamed to once and only once.

Example where isn't possible to swap all pairs

$(0, 0)$
$(1, 4)$
$(3, 2)$
$(5, 5)$

Oscar Mederos
  • 443
  • 3
  • 12
  • 1
    Is the list sorted after you rename the file and before you choose the next file to rename? Can you rewrite the sorting condition as: $(A,B,C) < (A',B',C')$ iff ($A < A'$) or ($A=A'$ and $B<B'$) or ($A=A'$ and $B=B'$ and $C < C'$)? – mjqxxxx Jan 31 '11 at 01:45
  • @mjqxxxx yes, it's sorted before choosing the next file (above/below) to rename. And yes, that could be the sorting condition, although you can use the one you prefer (for example, take the entire filename as an unique string) – Oscar Mederos Jan 31 '11 at 01:49
  • 1
    can you give an example of the list of filenames before and after applying "rename"? Also, what do you mean with a sequence? Normally, a sequence is a string of characters. Can you give an example of the kind of "sequence" you are looking for? – Marcos Villagra Jan 31 '11 at 02:05
  • @Marcos With a "sequence" I mean "a list of rename operations" – Oscar Mederos Jan 31 '11 at 02:13
  • now is more clear. Now I just want to check if I understand your question: Is there a sequence of "rename" operations that touches all files, given that you started from an arbitrary file? – Marcos Villagra Jan 31 '11 at 02:26
  • Yes. And of course, you cannot rename an already renamed file. I mean: "touch all the files" is equivalent to say "all the files has to be renamed to 'Part2, Part1.ext'". – Oscar Mederos Jan 31 '11 at 02:29
  • the restriction of choosing only the file from above or below makes the problem hard. Without this restriction there is a simple reduction from 2-SAT. – Marcos Villagra Jan 31 '11 at 03:15
  • @Marcos That restriction is really important. In fact, if you could just select the file you want after renaming one, there is always a way of renaming all the files. In fact, the answer is in O(1): Always True. I don't see where is the reduction from 2-SAT. Could you explain please? – Oscar Mederos Jan 31 '11 at 03:21
  • Sorry, I've meant "to 2-SAT". It cannot be "from". As you've said, always true. So my answer is nonsense, maybe just thinking "out-loud". I'm thinking in local search, but its a little bit hard to prove that with high probability "all files are touched". – Marcos Villagra Jan 31 '11 at 03:28
  • No problem. As soon as I have 50 of rating, I will start a bounty in this question, to see if someone can help on it :) – Oscar Mederos Jan 31 '11 at 03:33
  • I haven't vote yet, so try it :-) – Marcos Villagra Jan 31 '11 at 03:36
  • Reading the FAQ, I realized that 75 of rating is required to start a bounty :( http://stackoverflow.com/faq – Oscar Mederos Jan 31 '11 at 03:40
  • and you should wait at least 2 days. Maybe by then you'll get enough reputation. – Marcos Villagra Jan 31 '11 at 03:47
  • One more thing. There is subtle thing you should add to your quesiton. You should decide if you have a total order or a partial order on the elements in your list. If you consider a partial order, note that if a file goes on the top, and the file below is incomparable to this file, you will not be able to renamed (eve if it wasn't before). This will occurr for example for (A,B) and (B,A) – Marcos Villagra Jan 31 '11 at 03:51
  • 1
    (1) What is the role of the “extension” part? If you assume 2n strings appearing as “Part1” and “Part2” are all unique, then I do not see any relevance of the “extension” part. (2) It seems weird that the only initial list can be unsorted. Is this what you want? (3) What is the motivation behind this question? – Tsuyoshi Ito Jan 31 '11 at 03:58
  • @Tsuyoshi (1) I just added the extension because the problem was about files (yep, I know in Linux the aren't extensions). (2) I don't understand what you mean. (3) That was one of four problems assigned to me in the course of Algorithms Design & Analisis in my career (Computer Science). – Oscar Mederos Jan 31 '11 at 04:35
  • 3
    Assignment problems are not welcome on cstheory.stackexchange.com in general. – Tsuyoshi Ito Jan 31 '11 at 04:41
  • @Marcos You can see it as a total order. To make it simpler, just imagine that there aren't two files that when one of them is renamed, it becomes equal to the other. – Oscar Mederos Jan 31 '11 at 04:41
  • @Tsuyoshi Even if I delivered what I did about this problem three weeks ago? I read the FAQ before posting it and it didn't say anything about it :/ – Oscar Mederos Jan 31 '11 at 04:43
  • 3
    Hmm, I am not sure. Usually the logic here is that it is not a good practice to answer typical homework questions because doing so will ruin the purpose of homework for someone in the future. But in this case, the problem does not look like a typical problem. – Tsuyoshi Ito Jan 31 '11 at 04:46
  • Hopefully people on cstheory will have good ideas to attack this problem ;) Although I won't stop thinking about it ... ;) – Oscar Mederos Jan 31 '11 at 04:50
  • About my previous comment (2): As I understand it, when the list is sorted for the first time, the input list is completely unsorted. After that point, the list is always almost sorted (only one item may be out of place). This makes the first sort very different from the other sorts, and it looked like a weird setting to me. But if that is what is being asked, I am not the right person to complain about the problem setting. :) – Tsuyoshi Ito Jan 31 '11 at 04:50
  • Hmm.. I forgot to say that the input list of files is sorted, but that doesn't change too much it. You are right, after each rename, only one file is out of its position. – Oscar Mederos Jan 31 '11 at 04:54
  • (1) It does not essentially change the problem, but it reduces the weirdness I expressed. Now the problem is stated in a better way. :) (2) Can you give an example of the input where the answer is no? I think that it will help us (or at least me) understand the problem better. – Tsuyoshi Ito Jan 31 '11 at 05:07
  • @Tsuyoshi I will edit it and post an example of 4 files. – Oscar Mederos Jan 31 '11 at 05:12
  • Be careful: (1) NP and NP-complete are different. (2) To prove that some problem is NP-hard, you have to reduce a known NP-hard problem to that problem (note the direction). – Tsuyoshi Ito Jan 31 '11 at 05:13
  • Here is an idea: think of each line (x,y) as a node of a graph. Connect (x,y) to (y-1,z) and (y+1,z') with directed edges. If I understood correctly, you are asking for a Hamiltonian path in this graph (well, not exactly since after each rename a small number of edges will get removed/added). (ps: I voted to close as this didn't seem to be a research level question). – Kaveh Jan 31 '11 at 05:19
  • @Tsuyoshi I edited it, and I was refering to a reduction of an NP-Complete into this problem, to prove that it's NP-Complete. – Oscar Mederos Jan 31 '11 at 05:19
  • in edit 4, the non-existence of a sequence depends on the first entry you choose. It also depends on the selection you made after you renamed the entry. For that example, there is such a sequence if you select the entries carefully. – Marcos Villagra Jan 31 '11 at 07:43
  • 2
    maybe if you give a motivation different than "it was a homework", people could get interested and it won't be closed. What could be a possible application of this? – Marcos Villagra Jan 31 '11 at 07:58
  • @Marcos Could you give me the steps to follow to sort all files in the edit 4? – Oscar Mederos Jan 31 '11 at 08:08
  • What I see here, is a problem that hasn't been solved in my career for about 3 years. If we can find a solution, then it can be rewritten as a more technically problem (without files, extensions). For example, if it is a graph problem, it could be writen in terms of a graph and it could be really helpful for others to know that this problem is NP-Complete or not. Of course, nobody will have to deal with this problem (renaming files), but I just don't have a way to rewrite it now because I don't really know the main subject/topic it belongs. – Oscar Mederos Jan 31 '11 at 08:11
  • well, in fact it depends in the way you define the order. For example, start at (P,P), rename it and stays there. Then you go (C,B), it turns in (B,C). Here if you only consider the order in the first column, it just stays there, and you can continue up. If you also consider the second column, it will move up, and you get stuck – Marcos Villagra Jan 31 '11 at 08:40
  • so I imagine that here we care about the second column – Marcos Villagra Jan 31 '11 at 08:41
  • 2
    about reformuulating the problem, you can forget about files and see it this way. You have a set of pairs of positive integers $A={(x_1,y_1),\dots,(x_n,y_n)}$, and the rules are the same as you put it. Initially is sorted in the first column, then you start renaming the points. – Marcos Villagra Jan 31 '11 at 08:46
  • @Marcos yes, the 2nd column is important. Just imagine you're comparing two strings without the ','. That's why it doesn't work for that example. ;) – Oscar Mederos Jan 31 '11 at 08:54
  • @Marcos I'll try to write it more generic and formal and edit the question ;) – Oscar Mederos Jan 31 '11 at 10:23
  • What is the use case behind this? Even if you have your "optimal" sequence of renamings, is it really faster than just renaming all pairs and then sort them anew? And you would have to find the sequence, too... – Raphael Jan 31 '11 at 11:01
  • 1
    @Marcos, @Tsuyoshi I've edites the question. I think it's more clear now for people who haven't seen it yet. @Raphael It is a problem. In fact, I edited it so there are no files now. – Oscar Mederos Feb 01 '11 at 06:54
  • Great reformulation, good work. Note that your order is commonly known as lexicographic order.("It would be great to find a polynomial time solution to this problem, or a reduction of an NP-Complete problem into it." both would be beyond great ;)) – Raphael Feb 01 '11 at 09:43
  • If this is NP-complete, then I guess a possible way to prove that is to build appropriate gadgets for representing local conditions of an NP-complete problem and for connecting them to represent the global conditions. So here is a question: (how) can we represent a single 3SAT clause using this problem? – Kaveh Feb 01 '11 at 16:15

2 Answers2

16

... I searched some patterns to build a reduction from a NPC problem, but didn't find a way to represent a "flow" with a "fork" ...

So (after some work) this is a polynomial algorithm ...

ALGORITHM

The starting list can be viewed as an array of $N*2$ consecutive "holes". For each initial pair $(a_j,b_j)$, put the "element" $b_j$ at hole number $a_j$. Each pair can be viewed as a directed edge from position $a_j$ to position $b_j$. A move consists in picking an element $b_j$ at position $a_j$ and moving it to its destination position $b_j$ (the destination hole becomes an unmovable peg). We delete the edge, and proceed to choose the next move which will start from one of the two nearest reachable elements $b_k$ from position $b_j$ (only holes between $b_j$ and $b_k$ are allowed). We must find a sequence of $N$ consecutive moves.

  • For each $(a_j,b_j)$ consider $b_j$ (at array position $a_j$) as the starting element $start$.

    • For each $(a_k,b_k), a_k \neq a_j $ consider $a_k$ as the final element $end$ (the edge from position $a_k$ to position $b_k$ will be the final edge).

      • generate a sequence of moves from $start$ using the following criteria until you reach element $end$ (and a solution has been found), or a stop condition

When you make a move you fix a peg at position $b_j$ and the array is splitted in two partitions $L$ (left) and $R$ (right) and the only way to go from $L$ to $R$ (or from $R$ to $L$) is using an edge that jump across the peg. Set

  • $edgesLR$ = number of edges from left to right (do not count the final edge)
  • $edgesRL$ = number of edges from right to left (do not count the final edge)
  • $flow$ = $edgesLR - edgesRL$

Cases:

A) if $| flow | > 1$ then one of the two partitions will become unreachable, stop

Now suppose that $end > b_j$, i.e. $end \in R$

B) if $flow = 1$ then there is an extra edge from left to right, you must go left (pick the nearest element of $L$), otherwise you will never reach $end$

C) if $flow = -1$ then there is an extra edge from right to left and whatever node you pick you will never reach $end$, stop

D) if $flow = 0$ you must go right (pick the nearest element of $R$), otherwise you will neve reach $end$

If $end < b_j$ ($end \in L$), B,C,D are inverted.

NOTE: when moving left or right, you must consider $end$ as a peg. For example, if you must go right, but the nearest element on $R$ is $end$ then the move is impossible (and you must proceed with another pair $(start,end)$)

Apply the same resoning at every move.

COMPLEXITY

The flows over each hole can be precalculated in O(N) and reused at every scan.

The loops are:

for start = 1 to N
  for end = 1 to N
    for move = 1 to N
      make a move (fix a peg and update flows)
      check if another move can be done using flow     

No choices are made during the computation, so the complexity of the algorithm is $O(N^3)$

CODE

This is a working Java implementation of the algorithm:

public class StrangeSort {
    static int PEG = 0xffffff, HOLE = 0x0;
    static int M = 0, N = 0, choices = 0, aux = 0, end;
    static int problem[][], moves[], edgeflow[], field[];    
    boolean is_hole(int x) { return x == HOLE; }
    boolean is_peg(int x) { return x == PEG; }
    boolean is_ele(int x) { return ! is_peg(x) && ! is_hole(x); };
    int []cp(int src[]) { // copy an array
        int res[] = new int[src.length];
        System.arraycopy(src, 0, res, 0, res.length);
        return res;
    }    
    /* find the first element on the left (dir=-1) right (dir=1) */
    int find(int pos, int dir, int nm) {
        pos += dir;
        while (pos >= 1 && pos <= M ) {
            int x = field[pos];
            if ( is_peg(x) || (pos == end && nm < N-1) ) return 0;
            if ( is_ele(x) ) return pos;
            pos += dir;
        }
        return 0;
    }
    void build_edges() {
        edgeflow = new int[M+1];
        for (int i = 1; i<=M; i++) {
            int start = i;
            int b = field[start];
            if (! is_ele(b)) continue;
            if (i == end) continue;
            int dir = (b > start)? 1 : -1;
            start += dir;
            while (start != b) { edgeflow[start] += dir; start += dir; }
        }
    }
    boolean rec_solve(int start, int nm) {
        boolean f;
        int j;
        int b = field[start];
        moves[nm++] = b;
        if (nm == N) return true;
        //System.out.println("Processing: " + start + "->" + field[start]);        
        field[start] = HOLE;
        field[b] = PEG;
        int dir = (b > start)? 1 : -1;
        int i = start + dir;
        while (i != b) { edgeflow[i] -= dir; i += dir; } // clear edge                
        int flow = edgeflow[b];
        if (Math.abs(flow) > 2) return false;
        if (end > b) {
            switch (flow) {
            case 1 :                    
                j = find(b,-1,nm);
                if (j <= 0) return false;
                return rec_solve(j,nm);
            case -1 :
                return false;
            case 0 :          
                j = find(b,1,nm);
                if (j <= 0) return false;
                return rec_solve(j,nm);
            }        
        } else {
            switch (flow) {
            case -1 :                    
                j = find(b,1,nm);
                if (j <= 0) return false;
                return rec_solve(j,nm);
            case 1 :
                return false;
            case 0 :          
                j = find(b,-1,nm);
                if (j <= 0) return false;
                return rec_solve(j,nm);
            }            
        }
        return false;
    }
    boolean solve(int demo[][]) {
        N = demo.length;
        for (int i = 0; i < N; i++)
            M = Math.max(M, Math.max(demo[i][0], demo[i][1]));
        moves = new int[N];
        edgeflow = new int[M+1];
        field = new int[M+1];
        problem = demo;        
        for (int i = 0; i < problem.length; i++) {
            int a = problem[i][0];
            int b = problem[i][1];
            if ( a < 1 || b < 1 || a > M || b > M || ! is_hole(field[a]) || ! is_hole(field[b])) {
                System.out.println("Bad input pair (" + a + "," + b + ")");
                return false;
            }
            field[a] = b;
        }
        for (int i = 1; i <= M; i++) {
            end = i;
            build_edges();
            if (!is_ele(field[i])) continue;
            for (int j = 1; j <= M; j++) {
                if (!is_ele(field[j])) continue;
                if (i==j) continue;
                int tmp_edgeflow[] = cp(edgeflow);
                int tmp_field[] = cp(field);
                choices = 0;
                //System.out.println("START: " + j + " " + " END: " + i);
                if (rec_solve(j, 0)) {
                    return true;
                }
                edgeflow = tmp_edgeflow;
                field = tmp_field;
            }
        }
        return false;
    }
    void init(int demo[][]) {

    }
    public static void main(String args[]) {
        /**** THE INPUT ********/        

        int demo[][] =  {{4,2},{5,7},{6,3},{10,12},{11,1},{13,8},{14,9}};

        /***********************/        
        String r = "";
        StrangeSort sorter = new StrangeSort();       
        if (sorter.solve(demo)) {
            for (int i = 0; i < N; i++) { // print it in clear text
                int b =  moves[i];
                for (int j = 0; j < demo.length; j++)
                    if (demo[j][1] == b)
                        r += ((i>0)? " -> " : "") + "(" + demo[j][0] + "," + demo[j][1] + ")";
            }             
            r = "SOLUTION: "+r;
        }
        else
            r = "NO SOLUTIONS";
        System.out.println(r);
    }    
}
Marzio De Biasi
  • 22,915
  • 2
  • 58
  • 127
  • This is an interesting approach. In general, whenever you use an edge $(a,b)$, there must be equal (or differing by one) numbers of unused edges crossing $b$ in each direction; and if the numbers differ by one, you know which edge you must take next. When the numbers are equal, you have a choice, which you must resolve by testing both options. It seems like an efficient enough search strategy, but how do you know it's polynomial in the worst case? I.e., how do you know you will only encounter $O(\log n)$ choices where the number of unused crossing edges in each direction is equal? – mjqxxxx Feb 03 '11 at 18:24
  • @mjqxxxx ... I rewrote the whole answer to match the Java algorithm ... – Marzio De Biasi Feb 04 '11 at 16:12
  • @mjqxxxx ... ok, finally I got it ... :-) – Marzio De Biasi Feb 05 '11 at 13:59
  • @Oscar .. is it enough for the bounty :-))) – Marzio De Biasi Feb 05 '11 at 18:18
  • @Vor I've been busy. Today I will take a deeper look at the solution you provided and will translate the code to C#. I will also keep it running with random test cases to test against the brute-force algorithm I have ;) You have done a great job, so I think the bounty is yours!! – Oscar Mederos Feb 05 '11 at 18:50
  • @Oscar ok! ... I already checked the Java class StrangeSort against another program that uses brute force, and it worked well ... it can be optimized a little bit (build_edges() can be called only once and its result reused, and you can use a simpler "for loop" instead of the recursive rec_solve() ...) however the complexity stays O(N^3). Let me know if you want further help or explanations. – Marzio De Biasi Feb 05 '11 at 22:55
  • @Oscar did you check the algorithm? – Marzio De Biasi Feb 08 '11 at 09:01
  • 2
    This looks correct and very elegant to me. Once you use an edge $(a,b)$, you can no longer "walk" across $b$; the only remaining transitions across $b$ are the unused "jumps" (directed edges) crossing it, all of which you must use. If the final edge $(a_n, b_n)$ is specified, then you need to wind up on the same side of $b$ as $a_n$. There is then only one possible direction to walk after each edge, since an odd (even) number of jumps will leave you on the opposite (same) side you initially walked to. So testing each choice of starting and ending edges can be done in polynomial time. – mjqxxxx Feb 08 '11 at 22:06
  • One typo: where you say "consider $b_k$ as the final element $end$", I think you need to say $a_k$ instead. – mjqxxxx Feb 08 '11 at 22:08
  • @Vor I will deep into it today and I will let you know! Anyway, I think the best answer was yours, and I'm sure it will work! :) – Oscar Mederos Feb 09 '11 at 01:30
  • @mjqxxxx thanks. corrected to "... consider $b_k$ (at array position $a_k$) as the final element ..." – Marzio De Biasi Feb 09 '11 at 07:30
  • 1
    This is a beautiful algorithm. It never occurred to me to fix the last move first. Minor points: (1) As mjqxxxx wrote, end must be a_k. Otherwise the condition “end>b_j” is wrong. (2) Either the definition of “flow” has to be negated, or the cases B and C have to be swapped. – Tsuyoshi Ito Feb 10 '11 at 00:51
  • @Tsuyoshi (1) OK! finally I saw the confusing stuff ... in the condition check end is of course position a_k (which contains element b_k) (2) OK! ... (CODE vs DESCRIPTION 2-0 ) :-) Thank you! – Marzio De Biasi Feb 10 '11 at 09:08
10

This is not a solution, but a reformulation that avoids explicit mention of the swapping and sorting operations. Start by sorting the entire combined list of filenames and their swapped versions, and identify each filename with its index in that list. Then two files are neighbors if all the old filenames between them have already been destroyed, and if none of the new filenames between them have been created yet. The reformulated problem is the following:

Given a set of $n$ disjoint directed edges $(a, b)$ with $a, b \in \{1,2,…,2n\}$, is there an ordering $(a_1, b_1), (a_2, b_2),...,(a_n, b_n)$ of these edges such that

  • if $a_j$ is between $b_i$ and $a_{i+1}$, then $j \le i$, and
  • if $b_j$ is between $b_i$ and $a_{i+1}$, then $j \ge i+1$?
mjqxxxx
  • 1,468
  • 13
  • 21
  • 2
    +1. This is a much simpler way to state the equivalent problem. Just one clarification: the edges (a,b) are directed (in the sense that the edge (a,b) and the edge (b,a) have different meanings). – Tsuyoshi Ito Feb 01 '11 at 18:53
  • @Tsuyoshi: thanks; I edited to say 'directed'. – mjqxxxx Feb 01 '11 at 19:10
  • As I understand the phrase "$b$ is between $a$ and $c$" means $a\leqslant b\leqslant c$. So I guess it's worth changing the former notation by the latter one. – Oleksandr Bondarenko Feb 02 '11 at 18:43
  • @Oleksandr: Here “b is between a and c” means “either a<b<c or c<b<a.” – Tsuyoshi Ito Feb 02 '11 at 19:21